Reputation: 758
I need some conceptual clarity with the inputs of the Pytorch grad() function. Please see the following code:
import torch
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)
Q = 1/3*a**3 - 1/2*b**2
Here, I have defined 3 tensors, I was trying to compute the derivative of Q
w.r.t a
.
The following line would simply compute the first and second derivatives.
Q_a = torch.autograd.grad(Q.sum(), a, create_graph=True)[0]
Q_aa = torch.autograd.grad(Q_a.sum(), a, create_graph=True)[0]
print('Q_a =',Q_a.detach().numpy())
print('Q_aa =',Q_aa.detach().numpy())
The output is:
Q_a = [4. 9.]
Q_aa = [4. 6.]
I am wondering that, why do I need to pass Q.sum()
or Q_a.sum()
which is just 1 value and the second argument a
has two values.
>>> print(Q.sum())
>>> tensor(-14.3333, grad_fn=<SumBackward0>)
>>> print(a)
>>> tensor([2., 3.], requires_grad=True)
Can someone explain to me how does Q.sum()
helps in computing the correct gradient. Is it possible to compute the derivatives with just Q
not Q.sum()
?
Upvotes: 1
Views: 2775
Reputation: 2268
Well, your question is based on wrong assumption. You said ..
I was trying to compute the derivative of
Q
w.r.ta
NO. you are not. In the code sample you provided, you are trying to compute the derivative of Q.sum()
w.r.t a
- they are different things.
"Derivative of Q
w.r.t a
" is matrix called Jacobian, whereas ..
"Derivative of Q.sum()
w.r.t a
" is a vector known as gradient.
Both can be computed and are used in different places for achieving different things. Its your decision which one you want.
Upvotes: 4