Reputation: 3534
I'm working on a project where the model requires access to a tensor that i declare in the constructor init of the class (im sub-classing torch.nn.Module class) and then i need to use this tensor in the forward() method via a simple matmul() , the model is sent to gpu via a cuda() call:
model = Model()
model.cuda()
However when i do forward-propagation of a simple input X through:
model(X) # or model.forward(X)
I get
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'mat2'
Indicating that the second argument of matmul(the instance tensor i declared) is on CPU and it was expected on GPU (as the rest of the model and data).
In matmul, the tensor is transposed via matrix.t()
I even tried overriding the cuda() method thorugh:
def cuda(self):
super().cuda()
self.matrix.cuda()
The data is already in the GPU ,meaning the following line of code was already executed:
X = X.cuda()
Also the error explcitly says argument 2 of matmul which for this case is the tensor(called matrix) not X.
Upvotes: 9
Views: 7722
Reputation: 15200
I would like to highlight this from @Vaisakh's answer:
While nn.Module.cuda() moves all the Parameters and Buffers of the Module to GPU and returns itself, torch.Tensor.cuda() only returns a copy of the tensor on the GPU.
In other words, as @Umang_Gupta says in his comment:
# if m is a Module, you do:
m.cuda()
# if t is a Tensor, you do:
t = t.cuda()
Upvotes: 8
Reputation: 331
Let's assume the following:
X
is moved correctly to the GPU
The tensor declared in the Model
class is a simple attribute.
i.e. Something like the following:
class Model(nn.Module):
def __init__(self):
super().__init__()
self.matrix = torch.randn(784, 10)
def forward(self, x):
return torch.matmul(x, self.matrix)
If so, your first attempt wouldn't work because the nn.Module.cuda()
method only moves all of the Parameters
and Buffers
to the GPU.
You would need to make Model.matrix
a Parameter
instead of regular attribute.
Wrap it in the parameter class.
Something like:
self.matrix = nn.Parameter(torch.randn(784, 10))
Now, instead of automatically casting to the GPU like above, you tried to manually call the .cuda()
method on Model.matrix
within the override.
This doesn't work either because of a subtle difference between the nn.Module.cuda()
method and the torch.Tensor.cuda()
method.
While nn.Module.cuda()
moves all the Parameters
and Buffers
of the Module
to GPU and returns itself, torch.Tensor.cuda()
only returns a copy of the tensor on the GPU.
The original tensor is unaffected.
In summary, either:
matrix
attribute as a Parameter
orself.matrix = self.matrix.cuda()
In your override.
I would suggest the first.
Upvotes: 14