Reputation: 71

OpenCL vs OpenMP, How much performance difference when dealing with LBM problems?

I would like to find a suitable GPU acceleration package for Lattice Boltzmann Method (LBM) or normal Navier-Stokes CFD.

CUDA is device dependent, which is already out of my vision.

OpenCL is around 3 times faster than OpenMP when doing CFD, according to https://arxiv.org/abs/1704.05316

But there is no comparison on LBM.

OpenCL is 2 times harder to code.

I am considering about OpenCL and OpenMP now, please tell me how much performance difference between these two will it be on LBM problems?

Upvotes: 3

Answers (1)

ProjectPhysX

Reputation: 5736

I have implemented LBM in OpenCL, see my masters thesis. From testing my code on various GPUs and CPUs, and by comparing performance with other multi-CPU implementations, I can say that LBM on 1 GPU is about as fast as on ~2000-7000 CPU cores. The performance benefit really is massive as LBM efficiency on CPUs is extremely poor for all CPU codes (~10-50%). On the GPU, LBM is solely bottlenecked by memory bandwidth, which is orders of magnitude larger than on CPUs.

Also, on the Nvidia A100/V100 I get 97%/100% hardware efficiency (8800/5250 MLUPs/s for D3Q19 and FP32), so you can't say you would have a performance disadvantage compared to CUDA. I have verified that my code runs on Nvidia/AMD/Intel GPUs and Intel CPUs; it even runs on the Mali-G72 GPU of my smartphone.

So yes, I definitely recommend going with OpenCL for LBM.

Update: My LBM source code is now available on GitHub.

Upvotes: 2

OpenCL vs OpenMP, How much performance difference when dealing with LBM problems?

Answers (1)

Related Questions