Optimizing Values that are on GPU

Question

I am trying to optimize a PyTorch tensor which I am also using it as input to a network. Lets call this tensor "shape". My optimizer is as follows:

optimizer = torch.optim.Adam(
                        [shape],
                        lr=0.0001
                    )

I also am getting vertice values using this "shape" tensor:

vertices = model(shape)

And my loss function calculates loss as in differences of inferenced vertices and ground truth vertices:

loss = torch.sqrt(((gt_vertices - vertices) ** 2).sum(2)).mean(1).mean()

So what I am doing is actually estimating shape value. I am only interested in shape values. This works perfectly fine when everything is on CPU. However, when I put my shape and models on GPU by calling to("cuda"), I am getting the classic non-leaf Tensor error:

ValueError: can't optimize a non-leaf Tensor

Calling .detach().cpu() on shape inside optimizer solves the issue, but then gradient's cannot flow as they should be and my values are not updated. How can I make this work?

Ivan · Accepted Answer

When .to('cuda'), e.g. calling shape_p = shape.to('cuda'), you are making a copy of shape. While shape remains a leaf tensor, shape_p is not, because it's 'parent' tensor is shape. Therefore shape_p is not a leaf and returns the error when trying to optimize it.

Sending it to CUDA device after having set the optimizer, would solve the issue (there are certain instances when this can't be possible though, see here).

>>> optimizer = torch.optim.Adam([shape], lr=0.0001)
>>> shape = shape.cuda()

The best option though, in my opinion, is to send it directly on init:

>>> shape = torch.rand(1, requires_grad=True, device='cuda')
>>> optimizer = torch.optim.Adam([shape], lr=0.0001)

This way it remains a leaf tensor.

Optimizing Values that are on GPU

Answers (1)

Related Questions