Shahar
Shahar

Reputation: 141

Second gpu not utilized in multi-GPU Pytorch script

I'm trying to use two processes to speed up a script that runs on a sequence of images (each image is its own optimization problem).

I'm using torch.multiprocessing to spawn two processes. Each process initializes tensors, models optimizers running on a different GPU:

if __name__ == '__main__':
    num_processes = 2
    processes = []

    img_list = [...]
    img_indices = np.range(0, len(img_list))

    for gpu_idx in range(num_processes):
        subindices = img_indices[gpu_idx::num_processes]
        p = mp.Process(target=my_single_gpu_optimization_func, args=(img, img_list, subindices, gpu_idx))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()

Inside my_single_gpu_optimization_func, I define the target device as:

device = f'cuda:{gpu}'
model = MyModel(device=device)

The idea is that each GPU processes half of the images.

So when running, I expect to see both GPUs loaded, but in practice, I see that the memory usage on the first GPU doubles, compared to the single-GPU use case, and the runtime halves. The second GPU seems to be idle.

Why am I unable to utilize both GPUs and double my throughput?

Upvotes: 0

Views: 298

Answers (1)

Shahar
Shahar

Reputation: 141

What seems to work is to set CUDA_VISIBLE_DEVICES inside each process/thread function.

So:

os.environ['CUDA_VISIBLE_DEVICES'] = f'{gpu}'
my_model.to('cuda:0')

This seems very crude. I might as well just run two instances of my code from the command line this way. Is there a cleaner way of doing this without setting environment variables?

(BTW, I'm not sure overriding environment variables would work with threads, I really do need to fork a separate process for this solution to work)

Upvotes: 1

Related Questions