Reputation: 97
a = torch.nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float, device=device))
b = torch.nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float, device=device))
c = a + 1
d = torch.nn.Parameter(c, requires_grad=True,)
for epoch in range(n_epochs):
yhat = d + b * x_train_tensor
error = y_train_tensor - yhat
loss = (error ** 2).mean()
loss.backward()
print(a.grad)
print(b.grad)
print(c.grad)
print(d.grad)
Prints out
None
tensor([-0.8707])
None
tensor([-1.1125])
How do I learn the gradient for a and c? variable d needs to stay a parameter
Upvotes: 4
Views: 3002
Reputation: 4495
Basically, when you create a new tensor, like torch.nn.Parameter()
or torch.tensor()
, you are creating a leaf node tensor.
And when you do something like c=a+1
, c
will be intermediate node. You can print(c.is_leaf)
to check whether the tensor is leaf node or not. Pytorch will not calculate the gradient of intermediate node in default.
In your code snippet, a
, b
, d
are all leaf node tensor, and c
is intermediate node. c.grad
will None
as pytorch doesn't calculate the gradient for intermediate node. a
is isolated from the graph when you call loss.backword()
. That's why a.grad
is also None
.
If you change the code to this
a = torch.nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float, device=device))
b = torch.nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float, device=device))
c = a + 1
d = c
for epoch in range(n_epochs):
yhat = d + b * x_train_tensor
error = y_train_tensor - yhat
loss = (error ** 2).mean()
loss.backward()
print(a.grad) # Not None
print(b.grad) # Not None
print(c.grad) # None
print(d.grad) # None
You will find a
and b
have gradients, but c.grad
and d.grad
are None, because they're intermediate node.
Upvotes: 3