LBM: Optimized Implementations of the Lattice Boltzmann Method in 3D
Lattice Boltzmann Methods (LBM) are popular for numerical simulation of incompressible flows. This project is aimed to investigate and optimize simple lattice Boltzmann kernels for different architectures. This includes both commodity "off-the-shelf" architectures and tailored HPC systems, such as vector computers. We cover modern 64-bit processors ranging from IA32 compatible (Intel Xeon/Nocona, AMD Opteron), superscalar RISC (IBM Power4), IA64 (Intel Itanium 2) to classical vector (NEC SX6) and novel vector (Cray X1) architectures.
In the course of this project, we adviced the Bachelor Thesis of Stefan Donath and Johannes Habich and published several papers.
The Bachelor Thesis of Stefan Donath as well as the first report (see section Papers below) is on the influence of different memory layouts on
the performance of simple lattice Boltzmann kernels. By reordering the data of the array used it was able to supersede standard
cache-optimizing techniques like spatial blocking.
Stefan Donath himself presented his results on the SIAM Conference on Computational Science & Engineering
2005 in Orlando, Florida.
Parallelization and scaling behavior of LBM was examined in a second part of this project. Extensive experiments with both OpenMP and MPI on different contemporary Terascale architectures have been done and published at e.g. Supercomputing Conference 2004 and Parallel CFD Conference 2005.
Optimization and Application of 3D LBM for complex structures
In a further stage of the project we investigated in optimization possibilities of a LBM code for complex structures. In cooperation
with the Lattice Boltzmann Development Consortium a data representation which only stores fluid cells, ommitting obstacle data, was
examined. The results using memory traversion by space-filling curves were presented on the ASIM Conference 2005.
This part of the project was partially funded by the
Bavarian Graduate School for Computational Engineering which is part of the Elitenetzwerk Bayern.
To regard
the increasing complexity of continous surfaces the Bachelor
Thesis of Johannes Habich implemented more advanced and
accurate boundary conditions of second order. The influence on
performance of the additional calculations as well as the
possibility of different fluid to obstacle ratios were
well-investigated. This lead to the implementation of an
compressed list storage format which was thouroughly tested
for performance with different compressed list storage spatial
blocking factors. A shared memory parallelization was done to
meet todays increased medium grained parallelism.
Optimized GPU (Graphics Processing Unit) Implementations of the Lattice Boltzmann Method in 3D
Special purpose accelerators are an
emerging topic over the last years. To evaluate the effort of
implementing numerical kernels and the proposed benefit, the
Master Thesis of Johannes Habich implemented several
benchmarks to get hands-on knowledge about initial
implementation effort and optimization techniques on the
currently available nVIDIA Geforce G80 GPU. The huge thread
level parallelism leads to a new way of parallel programming,
which is supported by the nVIDIA CUDA framework.
The well known Streambenchmark was implemented
and demonstrated the potential of the memory subunit. The
implementation of a lattice Boltzmann driven fluid flow solver
showed deep insights into pitfalls of the hardware and led to
sophisticated optimization techniques which are in general
applicable.
In cooperation with the
Department of Computer Science 10 (Systemsimulation)
a new kernel was derived which was better suited for deployment in an MPI parallelized heterogeneous framework called
widely applicable Lattice Boltzmann from Erlangen (waLBerla).
An indepth analysis of the computation and communication pattern led to a very efficient and fast solver which is now developed towards different kinds of applications, e.g. particulate flows.
A major concern in comparison to stand alone solver development is that different communication networks lead to inevitable performance drawbacks. To optimize these communication stages is
the most important part of performance optimizations.
Acknowledgements
This project is partially funded by
KONWIHR (Competence Network for Technical, Scientific
High Performance Computing in Bavaria).
By cooperation with the
Department of Computer Science 10 (Systemsimulation)
and the
Chair of Fluid Dynamics
we ensure that the project is always as near as
possible to the engineering demands. Furthermore we are
working together with
Peter Lammers at HLRS
and
Jörg Bernsdorf of German Research School for Simulation Sciences GmbH.
This project is partially funded by
SKALB (Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen).
Infos & Talks
Papers
-
Gerhard Wellein, Thomas Zeiser, Stefan Donath, Georg Hager
On the Single Processor Performance of Simple Lattice Boltzmann Kernels
Computers & Fluids, 35:8-9 (2006) 910-919
PDF-File
-
Thomas Pohl, Nils Thürey, Frank Deserno, Ulrich Rüde, Peter Lammers, Gerhard Wellein, Thomas Zeiser
Performance Evaluation of Parallel Large-Scale Lattice Boltzmann Applications on Three Supercomputing Architectures
accepted for Supercomputing Conference, 2004.
PDF-File
-
Peter Lammers, Gerhard Wellein, Thomas Zeiser, Georg Hager, Michael Breuer
Have the vectors the continuing ability to parry the attack of the killer micros?
accepted for Proceedings of the 2nd Teraflop-Workshop at HLRS, March 2005. -
Gerhard Wellein, Thomas Zeiser, Peter Lammers, Uwe Küster
Towards Optimal Performance for Lattice Boltzmann Applications on Terascale Computers
accepted for Parallel CFD Conference, 2005.
PDF-File
-
Stefan Donath, Thomas Zeiser, Georg Hager, Johannes Habich, Gerhard Wellein
Optimizing Performance of the Lattice Boltzmann Method for Complex Structures on Cache-based Architectures
In Proceedings "Frontiers in Simulation: Simulationstechnique - 18th Symposium in Erlangen, September 2005 (ASIM)" (Editors: F. Hülsemann, M. Kowarschik, U. Rüde), SCS Publishing House, Erlangen, 2005, Pages 728-735.
PDF-File
-
Johannes Habich, Thomas Zeiser, Georg Hager, Gerhard Wellein
Speeding up a Lattice Boltzmann Kernel on nVIDIA GPUs.
In Proceedings of the " First International Conference on Parallel, Distributed and Grid Computing for Engineering, April 2009, Pecs, Hungary, PARENG09-S01" (Editors: B.H.V. Topping and P. Ivanyi ), Civil-Comp Press, Stirling, 2009.
Externer Link
Technical Reports
-
Thomas Zeiser, Gerhard Wellein, Georg Hager, Stefan Donath, Frank Deserno, Peter Lammers, Monika Wierse
Optimized Lattice Boltzmann Kernels as Testbeds for Processor Performance
PDF-File
Master Theses
-
Performance Evaluation of Numeric Compute Kernels on nVIDIA GPUs
Thesis (PDF-File)
(
Johannes Habich)
supervised by:
Prof. Ulrich Rüde, Dr. Gerhard Wellein, Dr. Thomas Zeiser, Dr. Georg Hager, Stefan Donath.
July 2008.
Bachelor Theses
-
On Optimized Implementations of the Lattice Boltzmann Method on Contemporary High Performance Architectures
Thesis (PDF-File)
(
Stefan Donath)
supervised by:
Prof. Ulrich Rüde, Dr. Gerhard Wellein, Thomas Zeiser, Georg Hager, Frank Deserno.
August 2004. -
Improving computational efficiency of Lattice Boltzmann methods on complex geometries
Thesis (PDF-File)
(
Johannes Habich)
supervised by:
Prof. Ulrich Rüde, Dr. Gerhard Wellein, Thomas Zeiser, Georg Hager.
Februar 2006.
Talks
-
Gerhard Wellein
Optimization Approaches and Performance Characteristics of Lattice Boltzmann Kernels
invited talk, International Conference for Mesoscopic Methods in Engineering and Science, Braunschweig, July 28, 2004. -
Stefan Donath
On Optimized Implementations of the Lattice Boltzmann Method on Contemporary High Performance Architectures

Presentation Slides
SIAM CSE05 Conference, Orlando, February 2005. -
Gerhard Wellein
Architecture and Performance of Terascale Computers
International Conference on Parallel Computational Fluid Dynamics, Maryland, May 24-27, 2005. -
Stefan Donath
Optimizing Performance of the Lattice Boltzmann Method for Complex Structures
ASIM Conference, Erlangen, September 2005. -
Gerhard Wellein
Efficient implementations of simple lattice Boltzmann kernels
Short Course

Presentation Slides
International Conference for mesoscopic Methods in Engineering and Science (ICMMES) 2006, Hampton/Norfolk, July 24, 2006.



