Shengyu Liu
Shengyu Liu

Reputation: 9

Pytorch Multi-GPU Issue

I want to train my model with 2 GPU(id 5, 6), so I run my code with CUDA_VISIBLE_DEVICES=5,6 train.py. However, when I printed torch.cuda.current_device I still got the id 0 rather than 5,6. But torch.cuda.device_count is 2, which semms right. How can I use GPU5,6 correctly?

Upvotes: 0

Views: 1117

Answers (2)

dtlam26
dtlam26

Reputation: 1600

you can check the device name to verify whether that is the correct name of that GPU. However, I think when you set the Cuda_Visible outside, you have forced torch to look only at that 2 gpu. So torch will manually set index for them as 0 and 1. Because of this, when you check the current_device, it will output 0

Upvotes: 0

hkchengrex
hkchengrex

Reputation: 4826

It is most likely correct. PyTorch only sees two GPUs (therefore indexed 0 and 1) which are actually your GPU 5 and 6.

Check the actual usage with nvidia-smi. If it is still inconsistent, you might need to set an environment variable:

export CUDA_DEVICE_ORDER=PCI_BUS_ID

(See Inconsistency of IDs between 'nvidia-smi -L' and cuDeviceGetName())

Upvotes: 1

Related Questions