Why do I get NONE gradient of parameters in a loaded model in Pytorch, even after backword?

Question

I have a pretrained model which was saved by

torch.save(net, 'lenet5_mnist_model')

And now I am loading it back and trying to calculate fisher information matrix like this:

precision_matrices = {}
batch_size = 32
my_model = torch.load('lenet5_mnist_model')
my_model.eval()              #  I tried to comment this off, but still no luck
for n, p in deepcopy({n: p for n, p in my_model.named_parameters()}).items()
   p = torch.tensor(p, requires_grad = True)
   p.data.zero_()
   precision_matrices[n] = variable(p.data)
for idx in range(int(images.shape[0]/batch_size)):
   x = images[idx*batch_size : (idx+1)*batch_size]
   my_model.zero_grad()
   x = Variable(x.cuda(), requires_grad = True)
   output = my_model(x).view(1,-1)
   label = output.max(1)[1].view(-1)
   loss = F.nll_loss(F.log_softmax(output, dim=1), label)
   loss = Variable(loss, requires_grad = True)
   loss.backward()
   for n, p in my_model.named_parameters():
       precision_matrices[n].data += p.grad.data**2

Finally, the above code will crash at the last line, because p.grad is NoneType. So the error is:

AttributeError: 'NoneType' object has no attribute 'data'.

Could someone provide some guidance on what caused the NoneType grad for the parameters? How should I fix this?

Michael Jungo · Accepted Answer

Your loss does not backpropagate the gradients through the model, because you are creating a new loss tensor with the value of the actual loss, which is a leaf of the computational graph, meaning that there is no history to backpropagate through.

loss.backward() needs to be called on the output of loss = F.nll_loss(F.log_softmax(output, dim=1), label).

I'm assuming that you thought you need to create a tensor with requires_grad=True, to be able to calculate the gradients. That is not the case. Tensors created with requires_grad=True are the leaves of the computational graph (they start the graph) and every operation performed on any tensor that is part of the graph is tracked such that the gradients can flow through the intermediate results to the leaves. Only tensors that need to be optimised (i.e. learnable parameters) should set requires_grad=True manually (the model's parameters do that automatically), everything else regarding the gradients is inferred. Neither x nor the loss are learnable parameters.

This confusion presumably arose due to the use of Variable. It was deprecated in PyTorch 0.4.0, which was released over 2 years ago, and all of its functionality has been merged into the tensors. Please do not use Variable.

x = images[idx*batch_size : (idx+1)*batch_size]
my_model.zero_grad()
x = x.cuda()
output = my_model(x).view(1,-1)
label = output.max(1)[1].view(-1)
loss = F.nll_loss(F.log_softmax(output, dim=1), label)
loss.backward()

Why do I get NONE gradient of parameters in a loaded model in Pytorch, even after backword?

Answers (1)

Related Questions