Reputation: 9
I want to train my model with 2 GPU(id 5, 6), so I run my code with CUDA_VISIBLE_DEVICES=5,6 train.py
. However, when I printed torch.cuda.current_device I still got the id 0
rather than 5,6. But torch.cuda.device_count is 2
, which semms right. How can I use GPU5,6 correctly?
Upvotes: 0
Views: 1117
Reputation: 1600
you can check the device name to verify whether that is the correct name of that GPU. However, I think when you set the Cuda_Visible outside, you have forced torch to look only at that 2 gpu. So torch will manually set index for them as 0 and 1. Because of this, when you check the current_device, it will output 0
Upvotes: 0
Reputation: 4826
It is most likely correct. PyTorch only sees two GPUs (therefore indexed 0 and 1) which are actually your GPU 5 and 6.
Check the actual usage with nvidia-smi
. If it is still inconsistent, you might need to set an environment variable:
export CUDA_DEVICE_ORDER=PCI_BUS_ID
(See Inconsistency of IDs between 'nvidia-smi -L' and cuDeviceGetName())
Upvotes: 1