algogator
algogator

Reputation: 97

pytorch - gradients not calculated for parameters

a = torch.nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float, device=device))
b = torch.nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float, device=device))
c = a + 1
d = torch.nn.Parameter(c, requires_grad=True,)
for epoch in range(n_epochs):
    yhat = d + b * x_train_tensor
    error = y_train_tensor - yhat
    loss = (error ** 2).mean()
    loss.backward()
    print(a.grad)
    print(b.grad)
    print(c.grad)
    print(d.grad)

Prints out

None
tensor([-0.8707])
None
tensor([-1.1125])

How do I learn the gradient for a and c? variable d needs to stay a parameter

Upvotes: 4

Views: 3002

Answers (1)

zihaozhihao
zihaozhihao

Reputation: 4495

Basically, when you create a new tensor, like torch.nn.Parameter() or torch.tensor(), you are creating a leaf node tensor.

And when you do something like c=a+1, c will be intermediate node. You can print(c.is_leaf) to check whether the tensor is leaf node or not. Pytorch will not calculate the gradient of intermediate node in default.

In your code snippet, a, b, d are all leaf node tensor, and c is intermediate node. c.grad will None as pytorch doesn't calculate the gradient for intermediate node. a is isolated from the graph when you call loss.backword(). That's why a.grad is also None.

If you change the code to this

a = torch.nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float, device=device))
b = torch.nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float, device=device))
c = a + 1
d = c
for epoch in range(n_epochs):
    yhat = d + b * x_train_tensor
    error = y_train_tensor - yhat
    loss = (error ** 2).mean()
    loss.backward()
    print(a.grad) # Not None
    print(b.grad) # Not None
    print(c.grad) # None
    print(d.grad) # None

You will find a and b have gradients, but c.grad and d.grad are None, because they're intermediate node.

Upvotes: 3

Related Questions