Sprungmarken

Videoportal der FAU

Die letzten Meldungen

Anleitungen des BSI zu sicheren Konfiguration, Installation und Minimierung von Windows-PCs

9. Februar 2012

Hiermit möchten wir auf zwei Veröffentlichungen desĀ  BSI (Bundesamt für Sicherheit in der Informationstechnik) hinweisen.
Weiterlesen...

Serverwartung Novell am Do, 09.02. ab 17 Uhr: MEMORY, HOMER, GWPOST4 (GW)

7. Februar 2012

Am Donnerstag, 09.02.2012 ab 17 Uhr findet eine dringende Serverwartung der folgenden Server statt. Die Wartung dauert bis voraussichtlich 19 Uhr. In dieser Zeit müssen die Server mehrmals gebootet werden, so dass wir empfehlen sich rechtzeitig von den Servern abzumelden.
Weiterlesen...

Firefox 10, Thunderbird 10, Firefox 10 ESR, Thunderbird 10 ESR sind da

2. Februar 2012

Seit 31.1.2012 sind Firefox und Thunderbird in der Version 10 verfügbar.
Weiterlesen...

Meldungen nach Thema

 

TinyGPU Cluster

The RRZE's TinyGPU cluster is an experimental cluster for developing and benchmarking applications using GPUs as accelerators.

  • 8 compute nodes, each with two Xeon 5550 "Nehalem" chips (8 cores + SMT) running at 2.66 GHz with 8 MB Shared Cache per chip, 24 GB of RAM (DDR3-1333) and 200 GB of local scratch disk; Two NVIDIA Tesla M1060 GPU Boards in every node

  • 1 compute node with two Xeon 5650 "Westmere" chips (12 cores + SMT) running at 2.66 GHz with 12 MB Shared Cache per chip, 48 GB of RAM (DDR3-1333) and 500 GB of local scratch disk; Two NVIDIA Tesla C2070 GPU Boards (plus two varying other GPUs)

  • Infiniband interconnect fabric with 20 GBit/s bandwith per link and direction

Jobs with less than one node are currently not supported by RRZE and are subject to be killed without notice. Thus, always use ppn=16 in the node specification for qsub.

This website shows information regarding the following topics:

Access, User Environment, and File Systems

Access to the machine

Access to TinyGPU is through the Woody Frontends. So, connect to

woody.rrze.uni-erlangen.de

and you will be randomly routed to one of the frontends for Woody, as there are no extra frontends for TinyGPU. See the documentation for the Woodcrest cluster for information about these frontends. Although the TinyGPU compute nodes actually run Ubuntu LTS, the environment is compatible. Programs compiled for Woody will just run on Tinygpu as well. In most cases, you even can compile CUDA programs on the Woody frontends (after loading the cuda module), although no GPU hardware is available there. In case of problems, try to compile your GPU programs on one of the TinyGPU compute nodes (e.g. within an interactive job).

For submitting Jobs, you will have to use the command qsub.tinygpu instead of the normal qsub.

In general, the documentation for Woody applies. This page will only list the differences to Woody.

File Systems

Parallel file system $FASTTMP

The parallel filesystem $FASTTMP in /wsfs is currently not available on TinyGPU.

Node-local storage $TMPDIR

Each node has at least 200 GB of local hard drive capacity for temporary files (instead of the 130 Woody has) available under /tmp/ (also accessible via /scratch/).

Compiling and running CUDA codes

Unfortunately, due to the experimental nature of this cluster, the proper way for doing this is still in the flow. Please contact hpc-support if you need assistance. However, in many cases you will find most of the required information by looking at the (default) cuda module (e.g. module show cuda).

Batch Processing

The batch system works just like on Woody, the few notable differences are:

  • The command for job submission is qsub.tinygpu instead of just qsub.
  • The compute nodes do not have 4 cores like Woody, but 8 physical cores plus 8 SMT cores. This means that the operating system will see 16 cores. In the moment, you have to generally request ppn=16 (or ppn=24 for the fermi queue) even if you only need less cores and independent of the number of GPUs used per node. A different mechanism may be established in the future, thus, check this documentation regularly for updates.
  • If you want to get the node with the C2070 GPUs (tg010), you have to submit your job to the queue "fermi", i.e. use qsub -q fermi ....
  • With the Nehalem, Intel has reintroduced the concept of Hyper Threading, although they now call it Simultaneous multithreading (SMT) and it actually is useful for some applications this time. You should test if your application runs better or worse with SMT. To run a job without using SMT, your still have to request all 16 cores of a node (see previous paragraph!), and then restrict your program to only the 8 "real" of them. The "real" cores on TinyGPU are the ones numbered 0-7. Core numbers 0-3 are the first physical socket, 4-7 the second; 8-15 are the corresponding virtual cores created by SMT. If you use mpirun, you can just use the parameters -npernode 8 -pin "0 1 2 3 4 5 6 7" to restrict your program to the right cores.

Further Information

Intel Xeon 5550 "Nehalem" Processor

The Externer Link:  Xeon 5550 processor implements Intel's Nehalem microarchitecture and is a dual-core chip running at 2.66 GHz. The most significant improvements compared to the Core 2 based chips (as used, e.g., in our Woodcrest cluster) have been made to the memory interface, and they can dynamically overclock themselves as long as they stay within their thermal envelope.

The memory interface controllers are now no longer in the chipset, but integrated into the CPU, a concept that is familiar from the Opteron CPUs of Intels competitor AMD. Intel has however decided to go the whole hog: Each CPU has no less than three independant memory channels, which leads to a vastly improved memory bandwidth compared to Core 2 based CPUs like the Woodcrest. Please note that this improvement really only applies to the memory interface. Applications that run mostly from the cache do not run better on Nehalem than on Woodcrest.

The physical CPU sockets are coupled with something called QPI. As the memory is now attached directly to the CPUs, accesses to the Memory of the other socket have to go through QPI and the other processor, so they are more expensive and slower. In other words, the Nehalems are CC-NUMA machines.

InfiniBand Interconnect Fabric

The InfiniBand network on TinyGPU is a double data rate (DDR) network, i.e. the links run at 20 GBit/s in each direction. All 8 nodes are connected to a small DDR switch and can thus communicate with each other fully non blocking.

Letzte Änderung: 30. Maerz 2011, Ansprechpartner, Historie

zum Seitenanfang

Startseite | Kontakt | Impressum

RRZE - Regionales RechenZentrum Erlangen, Martensstraße 1, D-91058 Erlangen | Tel.: +49 9131 8527031 | Fax: +49 9131 302941

Inhaltenavigation

FAU - Friedrich-Alexander-Universität
UnivIS - Informationssystem der Friedrich-Alexander-Universität Erlangen Nürnberg

Zielgruppennavigation

  1. Studierende
  2. Beschäftigte
  3. Einrichtungen
  4. IT-Beauftragte
  5. Presse & Öffentlichkeit