Using `DataParallel` when network needs a shared (constant) `Tensor`

Question

I would like to use DataParallel to distribute my computations across multiple GPUs along the batch dimension. My network requires a Tensor (let's call it A) internally, which is constant and doens't change through the optimization. It seems that DataParallel does not automatically copy this Tensor to all the GPUs in question, and the network will thus complain that the chunk of the input data x that it sees resides on a different GPU than A.

Is there a way DataParallel can handle this situation automatically? Alternatively, is there a way to copy a Tensor to all GPUs? Or should I just keep one Tensor for each GPU and manually figure out which copy to use depending on where the chunk seen by forward resides?

Szymon Maszke · Accepted Answer

You should wrap your tensor in torch.nn.Parameter and set requires_grad=False during it's creation.

torch.nn.Parameter does not mean the tensor has to be trainable.

It merely means it is part of the model and should be transferred if needed (e.g. multiple GPU).

If that wasn't the case, there is no way for torch to know which tensor inside __init__ is part of model (you could do some operations on tensors and add to self just to get something done).

I don't see a need for another function to do just that, though the name might be confusing a little bit.

Using `DataParallel` when network needs a shared (constant) `Tensor`

Answers (1)

Related Questions