How to Bind Each MPI Process to a Fixed Portion of GPU Cores on a Single GPU

Question

I am currently working with MPI and CUDA on a single GPU. I want to ensure that each MPI process is strictly bound to a specific portion of the GPU’s cores (SMs) and that this binding remains fixed throughout the execution. The goal is to avoid dynamically changing the GPU resources during runtime. However, CUDA does not provide direct control over the specific SM cores that each process uses.

I have a single CPU, and I am utilizing different CPU cores to execute different threads, with each thread running its own MPI process. I would like to know how to implement a solution where each MPI process is assigned a fixed, exclusive portion of the GPU’s computational resources without the possibility of interference or resource contention between processes. Specifically, I am trying to ensure that each MPI process operates on a distinct set of GPU cores (or equivalent computational units) and does not change during the computation.

Additionally, I am wondering if I should use Docker to isolate each MPI process, ensuring that the processes do not interfere with each other and that their resource allocations remain fixed throughout execution. Would Docker help in this situation, or are there better alternatives for achieving this isolation?

I’m trying to achieve a setup where each CPU core controls a specific portion of the GPU cores to execute some computational tasks. Additionally, I want to use MPI for communication between the processes, such as exchanging training parameters.

So far, I haven’t made any effective attempts because I’m unsure where to start or how to implement this. Any advice or suggestions would be greatly appreciated. Thank you!

How to Bind Each MPI Process to a Fixed Portion of GPU Cores on a Single GPU

Answers (0)

Related Questions