ToughMind
ToughMind

Reputation: 1009

Invalid device id when using pytorch dataparallel!

Environment:

Problem:

I am using dataparallel in Pytorch to use the two 2080Ti GPUs. Code are like below:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = Darknet(opt.model_def)  
model.apply(weights_init_normal) 

model = nn.DataParallel(model, device_ids=[0, 1]).to(device)

But when run this code, I encounter errors below:

Traceback (most recent call last):
  File "C:/Users/Administrator/Desktop/PyTorch-YOLOv3-master/train.py", line 74, in <module>
    model = nn.DataParallel(model, device_ids=[0, 1]).to(device)
  File "C:\Users\Administrator\Anaconda3\envs\py37_torch1.3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 133, in __init__
    _check_balance(self.device_ids)
  File "C:\Users\Administrator\Anaconda3\envs\py37_torch1.3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 19, in _check_balance
    dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
  File "C:\Users\Administrator\Anaconda3\envs\py37_torch1.3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 19, in <listcomp>
    dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
  File "C:\Users\Administrator\Anaconda3\envs\py37_torch1.3\lib\site-packages\torch\cuda\__init__.py", line 337, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id

When I debug into it, I find the function device_count() in get_device_properties() returns 1 while I have 2 GPU on my machine. And torch._C._cuda_getDeviceCount() returns 2 in Anaconda Prompt. What is wrong?

Qustion:

How to solve this problem? How can I manage to use the two GPUs using dataparallel? Thank you guys!

Upvotes: 5

Views: 11100

Answers (1)

cerebrou
cerebrou

Reputation: 5550

Basically as pointed out by @ToughMind, we need specify

os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1"

It depends though on the CUDA devices available in one's unit, so if someone has one GPU it may be appropriate to put, for example,

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Upvotes: 5

Related Questions