Reputation: 413
I am a beginner in pytorch and I face the following issue:
When I get the gradient of the below tensor (note that I use some variable x in some way as you can see below), I get the gradient:
import torch
myTensor = torch.randn(2, 2,requires_grad=True)
with torch.enable_grad():
x=myTensor.sum() *10
x.backward()
print(myTensor.grad)
Now, if I try to modify an element of myTensor
, I get the error of leaf variable has been moved into the graph interior
. See this code:
import torch
myTensor = torch.randn(2, 2,requires_grad=True)
myTensor[0,0]*=5
with torch.enable_grad():
x=myTensor.sum() *10
x.backward()
print(myTensor.grad)
What is wrong with my latter code? And how do I correct it?
Any help would be highly appreciated. Thanks a lot!
Upvotes: 10
Views: 6987
Reputation: 24119
The problem here is that this line represents an in-place operation:
myTensor[0,0]*=5
And PyTorch or more precisely autograd is not very good in handling in-place operations, especially on those tensors with the requires_grad
flag set to True
.
You can also take a look here:
https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd
Generally you should avoid in-place operations where it is possible, in some cases it can work, but you should always avoid in-place operations on tensors where you set requires_grad
to True
.
Unfortunately there are not many pytorch functions to help out on this problem. So you would have to use a helper tensor to avoid the in-place
operation in this case:
Code:
import torch
myTensor = torch.randn(2, 2,requires_grad=True)
helper_tensor = torch.ones(2, 2)
helper_tensor[0, 0] = 5
new_myTensor = myTensor * helper_tensor # new tensor, out-of-place operation
with torch.enable_grad():
x=new_myTensor.sum() *10 # of course you need to use the new tensor
x.backward() # for further calculation and backward
print(myTensor.grad)
Output:
tensor([[50., 10.],
[10., 10.]])
Unfortunately this is not very nice and I would appreciate if there would be a better or nicer solution out there.
But for all I know in the current version (0.4.1) you will have to got with this workaround for tensors with gradient resp. requires_grad=True
.
Hopefully for future versions there will be a better solution.
Btw. if you activate the gradient later you can see that it works just fine:
import torch
myTensor = torch.randn(2, 2,requires_grad=False) # no gradient so far
myTensor[0,0]*=5 # in-place op not included in gradient
myTensor.requires_grad = True # activate gradient here
with torch.enable_grad():
x=myTensor.sum() *10
x.backward() # no problem here
print(myTensor.grad)
But of course this will yield to a different result:
tensor([[10., 10.],
[10., 10.]])
Hope this helps!
Upvotes: 17