Reputation: 575
I'm running my program in a cluster. Each node has 2 GPUs. Each MPI task is to call a CUDA function.
My question is if there are two mpi processes running on each node, will each CUDA function call be scheduled on different GPUs or will they both run on the same? What about if I run 4 mpi tasks on each node?
Upvotes: 1
Views: 1351
Reputation: 456
Each MPI task calls one cuda function that is scheduled on whatever GPU you choose. You can choose the GPU you want using the function cudaSetDevice()
. In your case, since each node contains 2 GPUs you can switch between every GPU with cudaSetDevice(0)
and cudaSetDevice(1)
. If you don't specify the GPU using the SetDevice function and combining it with the MPI task rank
, I believe the 2 MPI tasks will run both cuda functions on the same default GPU (numbered as 0) serially. Furthermore, if you run 3 or more mpi tasks on each node, you will have a race condition for sure, since 2 or more cuda functions will run on the same GPU serially.
Upvotes: 3
Reputation: 72349
MPI and CUDA are basically orthogonal. You will have to explicitly manage MPI process-GPU affinity yourself. To do this, compute exclusive mode is pretty much mandatory for each GPU. You can use a split communicator with coloring to enforce processor-GPU affinity once each process has found a free device it can establish a context on.
Massimo Fatica from NVIDIA posted a useful code snippet on the NVIDIA forums a while ago that might get you started.
Upvotes: 2