Reputation: 3450

PyTorch's DataParallel is only using one GPU

I'm trying to use two GPU's for training a model in PyTorch. I'm using torch.nn.DataParallel but for some reason nvidia-smi is saying that I'm only using one GPU.

The code is something along the lines of:

>>> import torch.nn as nn 
>>> model = SomeModel()
>>> model = nn.DataParallel(model)
>>> model.to('cuda')

When I run the program and observe the output of nvidia-smi, I only see GPU 0 running. Would anybody know what the problem is?

Upvotes: 5

Answers (3)

MichaelSB

Reputation: 3181

Note that DataParallel will try to split your input tensor along the first dimension by default. If it's unable to do so (e.g. if the input is not a tensor) it might silently fall back to using a single GPU. For more detail, see https://discuss.pytorch.org/t/while-using-nn-dataparallel-only-accessing-one-gpu/19807/19

Upvotes: 0

efthimio

Reputation: 928

You should be using nn.DataParallel(model, [0,1]) in order to use GPU #0 and GPU #1. The call model.to('cuda') afterwards is not necessary. You may be tempted to use nn.DataParallel(model.to('cuda'), [0,1]), but this appears unnecessary as well.

Upvotes: 1

Simon

Reputation: 391

Had the same problem and solved it by realizing I was using it wrong. Maybe this will help someone save some time:

If you wrap your model like model = nn.DataParallel(model) it will wrap only the call to exactly this model (call means model(input)). So if your model has a linear layer and you now use that directly like model.my_linear(input) it will not be wrapped.

If you pass around references to your model instance in your code you may end up with a reference to the unwrapped model which then also will not work.

The easiest test would be to wrap your model immediately before calling it and to check if that works. If it does, your wrapping code before probably has one of the issues above.

Upvotes: 2

PyTorch&#39;s DataParallel is only using one GPU

Answers (3)

Related Questions

PyTorch's DataParallel is only using one GPU