Reputation: 1036
I am having a lot of problems using nn.DistributedDataParallel
, because I cannot find a good working example of how to specify GPU id's within a single node. For this reason, I want to start off by using nn.DataParallel
, since it should be easier to implement. According to the documentation [https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html] the following should work:
device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
model = Model(arg).to(device)
model = torch.nn.DataParallel(model, device_ids=[1, 8, 9])
for step, (original, keypoints) in enumerate(train_loader):
original, keypoints = original.to(device), keypoints.to(device)
loss = model(original)
optimizer.zero_grad()
total_loss.backward()
optimizer.step()
However, when I start to process the model is distributed to all three GPU's, but the training doesn't start. The RAM of the GPU's remains almost empty (except for the memory used for the loading the model). This can be seen here (see GPU 1, 8, 9):
Can someone explain me why that's not working?
Thanks a lot!!
Upvotes: 2
Views: 4881
Reputation: 979
I am making a guess here and I haven't tested it since I don't have multiple GPUs.
Since your suppose to load it to parallel first then move it to gpu
model = Model(arg)
model = torch.nn.DataParallel(model, device_ids=[1, 8, 9])
model.to(device)
You can check out here the tutorial I referenced here: https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html
Upvotes: 2