Reputation: 3450
I'm trying to use two GPU's for training a model in PyTorch. I'm using torch.nn.DataParallel
but for some reason nvidia-smi
is saying that I'm only using one GPU.
The code is something along the lines of:
>>> import torch.nn as nn
>>> model = SomeModel()
>>> model = nn.DataParallel(model)
>>> model.to('cuda')
When I run the program and observe the output of nvidia-smi
, I only see GPU 0 running. Would anybody know what the problem is?
Upvotes: 5
Views: 6647
Reputation: 3181
Note that DataParallel will try to split your input tensor along the first dimension by default. If it's unable to do so (e.g. if the input is not a tensor) it might silently fall back to using a single GPU. For more detail, see https://discuss.pytorch.org/t/while-using-nn-dataparallel-only-accessing-one-gpu/19807/19
Upvotes: 0
Reputation: 928
You should be using nn.DataParallel(model, [0,1])
in order to use GPU #0 and GPU #1. The call model.to('cuda')
afterwards is not necessary. You may be tempted to use nn.DataParallel(model.to('cuda'), [0,1])
, but this appears unnecessary as well.
Upvotes: 1
Reputation: 391
Had the same problem and solved it by realizing I was using it wrong. Maybe this will help someone save some time:
If you wrap your model like model = nn.DataParallel(model)
it will wrap only the call to exactly this model (call means model(input)
). So if your model has a linear layer and you now use that directly like model.my_linear(input)
it will not be wrapped.
If you pass around references to your model instance in your code you may end up with a reference to the unwrapped model which then also will not work.
The easiest test would be to wrap your model immediately before calling it and to check if that works. If it does, your wrapping code before probably has one of the issues above.
Upvotes: 2