Aly
Aly

Reputation: 413

Modifying a pytorch tensor and then getting the gradient lets the gradient not work

I am a beginner in pytorch and I face the following issue:

When I get the gradient of the below tensor (note that I use some variable x in some way as you can see below), I get the gradient:

import torch
myTensor = torch.randn(2, 2,requires_grad=True)
with torch.enable_grad():
    x=myTensor.sum() *10
x.backward()
print(myTensor.grad)

Now, if I try to modify an element of myTensor, I get the error of leaf variable has been moved into the graph interior. See this code:

import torch
myTensor = torch.randn(2, 2,requires_grad=True)
myTensor[0,0]*=5
with torch.enable_grad():
    x=myTensor.sum() *10
x.backward()
print(myTensor.grad)

What is wrong with my latter code? And how do I correct it?

Any help would be highly appreciated. Thanks a lot!

Upvotes: 10

Views: 6987

Answers (1)

MBT
MBT

Reputation: 24119

The problem here is that this line represents an in-place operation:

myTensor[0,0]*=5

And PyTorch or more precisely autograd is not very good in handling in-place operations, especially on those tensors with the requires_grad flag set to True.

You can also take a look here:
https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd

Generally you should avoid in-place operations where it is possible, in some cases it can work, but you should always avoid in-place operations on tensors where you set requires_grad to True.

Unfortunately there are not many pytorch functions to help out on this problem. So you would have to use a helper tensor to avoid the in-place operation in this case:

Code:

import torch

myTensor = torch.randn(2, 2,requires_grad=True)
helper_tensor = torch.ones(2, 2)
helper_tensor[0, 0] = 5
new_myTensor = myTensor * helper_tensor # new tensor, out-of-place operation
with torch.enable_grad():
    x=new_myTensor.sum() *10 # of course you need to use the new tensor
x.backward()                 # for further calculation and backward
print(myTensor.grad)

Output:

tensor([[50., 10.],
        [10., 10.]])

Unfortunately this is not very nice and I would appreciate if there would be a better or nicer solution out there.
But for all I know in the current version (0.4.1) you will have to got with this workaround for tensors with gradient resp. requires_grad=True.

Hopefully for future versions there will be a better solution.


Btw. if you activate the gradient later you can see that it works just fine:

import torch
myTensor = torch.randn(2, 2,requires_grad=False) # no gradient so far
myTensor[0,0]*=5                                 # in-place op not included in gradient
myTensor.requires_grad = True                    # activate gradient here
with torch.enable_grad():
    x=myTensor.sum() *10
x.backward()                                     # no problem here
print(myTensor.grad)

But of course this will yield to a different result:

tensor([[10., 10.],
        [10., 10.]])

Hope this helps!

Upvotes: 17

Related Questions