Reputation: 85
I am trying to optimize a PyTorch tensor which I am also using it as input to a network. Lets call this tensor "shape". My optimizer is as follows:
optimizer = torch.optim.Adam(
[shape],
lr=0.0001
)
I also am getting vertice values using this "shape" tensor:
vertices = model(shape)
And my loss function calculates loss as in differences of inferenced vertices and ground truth vertices:
loss = torch.sqrt(((gt_vertices - vertices) ** 2).sum(2)).mean(1).mean()
So what I am doing is actually estimating shape value. I am only interested in shape values. This works perfectly fine when everything is on CPU. However, when I put my shape and models on GPU by calling to("cuda")
, I am getting the classic non-leaf Tensor error:
ValueError: can't optimize a non-leaf Tensor
Calling .detach().cpu()
on shape inside optimizer solves the issue, but then gradient's cannot flow as they should be and my values are not updated. How can I make this work?
Upvotes: 0
Views: 395
Reputation: 40728
When .to('cuda')
, e.g. calling shape_p = shape.to('cuda')
, you are making a copy of shape
. While shape
remains a leaf tensor, shape_p
is not, because it's 'parent' tensor is shape
. Therefore shape_p
is not a leaf and returns the error when trying to optimize it.
Sending it to CUDA device after having set the optimizer, would solve the issue (there are certain instances when this can't be possible though, see here).
>>> optimizer = torch.optim.Adam([shape], lr=0.0001)
>>> shape = shape.cuda()
The best option though, in my opinion, is to send it directly on init:
>>> shape = torch.rand(1, requires_grad=True, device='cuda')
>>> optimizer = torch.optim.Adam([shape], lr=0.0001)
This way it remains a leaf tensor.
Upvotes: 1