Tinyblue Cluster

The
RRZE's Tinyblue cluster
(
IBM)
is a high-performance compute resource with high speed interconnect. It is intended for
distributed-memory (MPI) or hybrid parallel programs with medium to high communication requirements.
84 compute nodes, each with two Xeon 5550 "Nehalem" chips (8 cores + SMT) running at 2.66 GHz with 8 MB Shared Cache per chip, 12 GB of RAM (DDR3-1333) and 200 GB of local scratch disk
Infiniband interconnect fabric with 40 GBit/s bandwith per link and direction
- Overall peak performance of ca. 7 TFlop/s (?.?? TFlop/s LINPACK)
Tinyblue is a system that is designed for running parallel programs using significantly more than one node. Jobs with less than one node are not supported by RRZE and are subject to be killed without notice.
This website shows information regarding the following topics:
Access, User Environment, and File Systems
Access to the machine
Users can connect to
woody.rrze.uni-erlangen.de
and will be randomly routed to one of the frontends for woody, as there are no extra frontends for tinyblue. See the documentation for the Woodcrest cluster for information about these frontends. Although the tinyblue compute nodes actually run Ubuntu LTS, the environment is compatible. There is no difference in compiling things for Woody or Tinyblue, i.e. Programs compiled for Woody will just run on Tinyblue as well.
For submitting Jobs, you will have to use the command qsub.tinyblue
instead of the normal qsub.
In general, the documentation for Woody applies. This page will only list the differences to woody.
File Systems
Parallel file system $FASTTMP
The parallel filesystem $FASTTMP in /wsfs
is currently not available on tinyblue.
Node-local storage $TMPDIR
Each node has 200 GB of local hard drive capacity for temporary files
(instead of the 130 woody has)
available under /tmp/ (also accessible via /scratch/).
Batch Processing
The batch system works just like on Woody, the few notable differences are:
- The command for job submission is
qsub.tinyblueinstead of justqsub. - The compute nodes do not have 4 cores like Woody, but 8 physical
cores plus 8 SMT cores. This means
that the operating system will see 16 cores. You thus always
have to request
ppn=16on every qsub. - With the Nehalem, Intel has reintroduced the concept of Hyper Threading,
although they now call it Simultaneous multithreading
(SMT)
and it actually is useful for some applications this time. You should test
if your application runs better or worse with SMT.
To run a job without using SMT, your still have to request all
16 cores of a node (see last paragraph!), and then restrict your
program to only the 8 "real" of them. The "real"
cores on tinyblue are the ones numbered 0-7. Core numbers 0-3 are
the first physical socket, 4-7 the second; 8-15 are the corresponding
virtual cores created by SMT.
If you use mpirun, you can just use the parameters
-npernode 8 -pin "0 1 2 3 4 5 6 7"to restrict your program to the right cores. - Effective June 2010, jobs requesting more than 32 nodes will wait in the route queue until the big queue is enabled. The big queue will usually be activated only once or twice per week to avoid draining of TinyBlue for short running huge jobs.
Further Information
Intel Xeon 5550 "Nehalem" Processor
The
Xeon 5550 processor
implements Intel's Nehalem microarchitecture and is a quad-core chip running at 2.66 GHz.
The most significant improvements compared to the Core 2 based chips
(as used, e.g., in our Woodcrest cluster)
have been made to the memory interface, and they can dynamically overclock
themselves as long as they stay within their thermal envelope.
The memory interface controllers are now no longer in the chipset, but integrated into the CPU, a concept that is familiar from the Opteron CPUs of Intels competitor AMD. Intel has however decided to go the whole hog: Each CPU has no less than three independant memory channels, which leads to a vastly improved memory bandwidth compared to Core 2 based CPUs like the Woodcrest. Please note that this improvement really only applies to the memory interface. Applications that run mostly from the cache do not run better on Nehalem than on Woodcrest.
The physical CPU sockets are coupled with something called QPI. As the memory is now attached directly to the CPUs, accesses to the Memory of the other socket have to go through QPI and the other processor, so they are more expensive and slower. In other words, the Nehalems are CC-NUMA machines.
InfiniBand Interconnect Fabric
The InfiniBand network on tinyblue is a quad data rate (QDR) network, i.e. the links run at 40 GBit/s in each direction. It is fully non blocking, i.e. the backbone is capable of handling the maximum amount of traffic coming in through the client ports without any congestion. However, due to the fact that InfiniBand still uses static routing, i.e. once a route is established between two nodes it doesn't change even if the load on the backbone links changes, it is possible to generate traffic patterns that will cause congestion on individual links. This is however not likely to happen on normal user jobs.



