Reputation: 1009
I am using dataparallel
in Pytorch to use the two 2080Ti GPUs. Code are like below:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = Darknet(opt.model_def)
model.apply(weights_init_normal)
model = nn.DataParallel(model, device_ids=[0, 1]).to(device)
But when run this code, I encounter errors below:
Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/PyTorch-YOLOv3-master/train.py", line 74, in <module>
model = nn.DataParallel(model, device_ids=[0, 1]).to(device)
File "C:\Users\Administrator\Anaconda3\envs\py37_torch1.3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 133, in __init__
_check_balance(self.device_ids)
File "C:\Users\Administrator\Anaconda3\envs\py37_torch1.3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 19, in _check_balance
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "C:\Users\Administrator\Anaconda3\envs\py37_torch1.3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 19, in <listcomp>
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "C:\Users\Administrator\Anaconda3\envs\py37_torch1.3\lib\site-packages\torch\cuda\__init__.py", line 337, in get_device_properties
raise AssertionError("Invalid device id")
AssertionError: Invalid device id
When I debug into it, I find the function device_count()
in get_device_properties()
returns 1 while I have 2 GPU on my machine. And torch._C._cuda_getDeviceCount()
returns 2 in Anaconda Prompt. What is wrong?
How to solve this problem? How can I manage to use the two GPUs using dataparallel? Thank you guys!
Upvotes: 5
Views: 11100
Reputation: 5550
Basically as pointed out by @ToughMind, we need specify
os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1"
It depends though on the CUDA devices available in one's unit, so if someone has one GPU it may be appropriate to put, for example,
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
Upvotes: 5