pythondeep-learningpytorchgradienttensor

Reputation: 211

PyTorch RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

This code is built up as follows: My robot takes a picture, some tf computer vision model calculates where in the picture the target object starts. This information (x1 and x2 coordinate) is passed to a pytorch model. It should learn to predict the correct motor activations, in order to get closer to the target. After the movement is executed, the robot takes a picture again and the tf cv model should calculate whether the motor activation brought the robot closer to the desired state (x1 at 10, x2 coordinate at at31)

However every time i run the code pytorch is not able to calculate the gradients.

I'm wondering if this is some data-type problem or if it is a more general one: Is it impossible to calculate the gradients if the loss is not calculated directly from the pytorch network's output?

Any help and suggestions will be greatly appreciated.

#define policy model (model to learn a policy for my robot)
import torch
import torch.nn as nn
import torch.nn.functional as F 
class policy_gradient_model(nn.Module):
    def __init__(self):
        super(policy_gradient_model, self).__init__()
        self.fc0 = nn.Linear(2, 2)
        self.fc1 = nn.Linear(2, 32)
        self.fc2 = nn.Linear(32, 64)
        self.fc3 = nn.Linear(64,32)
        self.fc4 = nn.Linear(32,32)
        self.fc5 = nn.Linear(32, 2)
    def forward(self,x):
        x = self.fc0(x)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = F.relu(self.fc5(x))
        return x

policy_model = policy_gradient_model().double()
print(policy_model)
optimizer = torch.optim.AdamW(policy_model.parameters(), lr=0.005, betas=(0.9,0.999), eps=1e-08, weight_decay=0.01, amsgrad=False)

#make robot move as predicted by pytorch network (not all code included)
def move(motor_controls):
#define curvature
 #   motor_controls[0] = sigmoid(motor_controls[0])
    activation_left = 1+(motor_controls[0])*99
    activation_right = 1+(1- motor_controls[0])*99

    print("activation left:", activation_left, ". activation right:",activation_right, ". time:", motor_controls[1]*100)

#start movement

#main
import cv2
import numpy as np
import time
from torch.autograd import Variable
print("start training")
losses=[]
losses_end_of_epoch=[]
number_of_steps_each_epoch=[]
loss_function = nn.MSELoss(reduction='mean')

#each epoch
for epoch in range(2):
    count=0
    target_reached=False
    while target_reached==False:
        print("epoch: ", epoch, ". step:", count)
###process and take picture
        indices = process_picture()
###binary_network(sliced)=indices as input for policy model
        optimizer.zero_grad()
###output: 1 for curvature, 1 for duration of movement
        motor_controls = policy_model(Variable(torch.from_numpy(indices))).detach().numpy()
        print("NO TANH output for motor: 1)activation left, 2)time ", motor_controls)
        motor_controls[0] = np.tanh(motor_controls[0])
        motor_controls[1] = np.tanh(motor_controls[1])
        print("TANH output for motor: 1)activation left, 2)time ", motor_controls)
###execute suggested action
        move(motor_controls)
###take and process picture2 (after movement)
        indices = (process_picture())
###loss=(binary_network(picture2) - desired
        print("calculate loss")
        print("idx", indices, type(torch.tensor(indices)))
     #   loss = 0
      #  loss = (indices[0]-10)**2+(indices[1]-31)**2
       # loss = loss/2
        print("shape of indices", indices.shape)
        array=np.zeros((1,2))
        array[0]=indices
        print(array.shape, type(array))
        array2 = torch.ones([1,2])
        loss = loss_function(torch.tensor(array).double(), torch.tensor([[10.0,31.0]]).double()).float()
        print("loss: ", loss, type(loss), loss.shape)
       # array2[0] = loss_function(torch.tensor(array).double(), 
        torch.tensor([[10.0,31.0]]).double()).float()
        losses.append(loss)
#start line causing the error-message (still part of main)
###calculate gradients
        loss.backward()
#end line causing the error-message (still part of main)

###apply gradients        
        optimizer.step()

#Output (so far as intented) (not all included)

#calculate loss
idx [14. 15.] <class 'torch.Tensor'>
shape of indices (2,)
(1, 2) <class 'numpy.ndarray'>
loss:  tensor(136.) <class 'torch.Tensor'> torch.Size([])

#Error Message:
Traceback (most recent call last):
  File "/home/pi/Desktop/GradientPolicyLearning/PolicyModel.py", line 259, in <module>
    array2.backward()
  File "/home/pi/.local/lib/python3.7/site-packages/torch/tensor.py", line 134, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/pi/.local/lib/python3.7/site-packages/torch/autograd/__init__.py", line 99, in 
 backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Upvotes: 21

Answers (7)

Lucas Basquerotto

Reputation: 8143

I had this problem, and solved it making sure the gradient is enabled. In the case above, it would be something like:

with torch.set_grad_enabled(True):
    indices = process_picture()
    #...
    loss.backward()

The scope may be reduced depending on which instructions actually generate or manipulate gradients.

Upvotes: 0

Rajarshi Mandal

Reputation: 399

Following worked for me:

loss.requires_grad = True
loss.backward()

Upvotes: 14

Deepanshu Mehta

Reputation: 1790

simple solution, turn on the Context Manager that sets gradient calculation to ON, if it is off

torch.set_grad_enabled(True)  # Context-manager

Upvotes: 11

kilojoules

Reputation: 10093

In my case, I got past this error by specifying requires_grad=True when defining my input tensors

import numpy as np
import matplotlib.pyplot as plt
plt.style.use('dark_background')

# define rosenbrock function and gradient
a = 1
b = 5
def f(x):
   return (a - x[0]) ** 2 + b * (x[1] - x[0] ** 2) ** 2

def jac(x):
   dx1 = -2 * a + 4 * b * x[0] ** 3 - 4 * b * x[0] * x[1] + 2 * x[0]
   dx2 = 2 * b * (x[1] - x[0] ** 2)
   return np.array([dx1, dx2])

# create stochastic rosenbrock function and gradient
def f_rand(x):
   return f(x) * np.random.uniform(0.5, 1.5)

def jac_rand(x): return jac(x) * np.random.uniform(0.5, 1.5)

# use hand coded adam
x = np.array([0.1, 0.1])
x0 = x.copy()
j = jac_rand(x)
beta1=0.9
beta2=0.999
eps=1e-8
m = x * 0
v = x * 0
learning_rate = .1
for ii in range(200):
   m = (1 - beta1) * j + beta1 * m  # first  moment estimate.
   v = (1 - beta2) * (j ** 2) + beta2 * v  # second moment estimate.
   mhat = m / (1 - beta1 ** (ii + 1))  # bias correction.
   vhat = v / (1 - beta2 ** (ii + 1))
   x = x - learning_rate * mhat / (np.sqrt(vhat) + eps)
   x -= learning_rate * v
   j = jac_rand(x)

print('hand code finds optimal to be ', x, f(x))

# attempt to use pytorch
import torch
x_tensor = torch.tensor(x0, requires_grad=True)
optimizer = torch.optim.Adam([x_tensor], lr=learning_rate)

def closure():
   optimizer.zero_grad()
   loss = f_rand(x_tensor)
   loss.backward()
   return loss

for ii in range(200):
   optimizer.step(closure)

print('My PyTorch attempt found ', x_tensor, f(x_tensor))

Upvotes: 0

Maria

Reputation: 309

Make sure that all your inputs into the NN, the output of NN and ground truth/target values are all of type torch.tensor and not list, numpy.array or any other iterable.

Also, make sure that they are not converted to list or numpy.array at any point either.

In my case, I got this error because I performed list comprehension on the tensor containing predicted values from NN. I did this to get the max value in each row. Then, converted the list back to a torch.tensor. before calculating the loss.

This back and forth conversion disables the gradient calculations

Upvotes: 1

dumbPy

Reputation: 1518

If you call .detach() on the prediction, that will delete the gradients. Since you are first getting indices from the model and then trying to backprop the error, I would suggest

prediction = policy_model(torch.from_numpy(indices))
motor_controls = prediction.clone().detach().numpy()

This would keep the predictions as it is with the calculated gradients that can be backproped.
Now you can do

loss = loss_function(prediction, torch.tensor([[10.0,31.0]]).double()).float()

Note, you might wanna call double of the prediction if it throws an error.

Upvotes: 12

nsidn98

Reputation: 1087

It is indeed impossible to calculate the gradients if the loss is not calculated directly from the PyTorch network's output because then you would not be able to apply the chain rule which is used to optimise the gradients.

Upvotes: 6

PyTorch RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Answers (7)

Related Questions