arc_lupus
arc_lupus

Reputation: 4114

Distribution of MPI-threaded program on available GPUs, using slurm

My program consists out of two parts, A and B, both written in C++. B is loaded from a separate DLL, and is capable of running both on the CPU or on the GPU, depending on how it is linked. When the main program is launched, it creates one instance of A, which in turn creates one instance of B (which then either works on the locally available CPUs or on the first GPU).
When launching the program using mpirun (or via slurm, which in turn launches mpirun), for each MPI rank one version of A is created, which in turn creates one version of B for itself. When only one GPU is in the system, this GPU will be used, but what happens if there are multiple GPUs in the system? Are versions of B all placed on the same GPU, regardless if there are several GPUs available, or are they distributed evenly?
Is there any way to influence that behavior? Unfortunately my development machine does not have multiple GPUs, thus I can not test it, except on production.

Upvotes: 0

Views: 405

Answers (1)

PTooley
PTooley

Reputation: 391

Slurm supports and understands binding MPI ranks to GPUs through for-example the --gpu-bind option: https://slurm.schedmd.com/gres.html. Assuming that the cluster is correctly configured to enforce GPU affinities, this will then allow you assign one GPU per rank even if there are multiple ranks on a single node.

If you want to be able to test this, you could use for example the cudaGetDevice and cudaGetDeviceProperties calls to get the device luid (local unique id) for each rank and then check that there is no duplication of luids within a node.

Upvotes: 2

Related Questions