Reputation: 71
I would like to find a suitable GPU acceleration package for Lattice Boltzmann Method (LBM) or normal Navier-Stokes CFD.
CUDA is device dependent, which is already out of my vision.
OpenCL is around 3 times faster than OpenMP when doing CFD, according to https://arxiv.org/abs/1704.05316
But there is no comparison on LBM.
OpenCL is 2 times harder to code.
I am considering about OpenCL and OpenMP now, please tell me how much performance difference between these two will it be on LBM problems?
Upvotes: 3
Views: 947
Reputation: 5736
I have implemented LBM in OpenCL, see my masters thesis. From testing my code on various GPUs and CPUs, and by comparing performance with other multi-CPU implementations, I can say that LBM on 1 GPU is about as fast as on ~2000-7000 CPU cores. The performance benefit really is massive as LBM efficiency on CPUs is extremely poor for all CPU codes (~10-50%). On the GPU, LBM is solely bottlenecked by memory bandwidth, which is orders of magnitude larger than on CPUs.
Also, on the Nvidia A100/V100 I get 97%/100% hardware efficiency (8800/5250 MLUPs/s for D3Q19 and FP32), so you can't say you would have a performance disadvantage compared to CUDA. I have verified that my code runs on Nvidia/AMD/Intel GPUs and Intel CPUs; it even runs on the Mali-G72 GPU of my smartphone.
So yes, I definitely recommend going with OpenCL for LBM.
Update: My LBM source code is now available on GitHub.
Upvotes: 2