How does grad() works in PyTorch?

Question

I need some conceptual clarity with the inputs of the Pytorch grad() function. Please see the following code:

import torch
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)
Q = 1/3*a**3 - 1/2*b**2

Here, I have defined 3 tensors, I was trying to compute the derivative of Q w.r.t a. The following line would simply compute the first and second derivatives.

Q_a = torch.autograd.grad(Q.sum(), a, create_graph=True)[0]
Q_aa = torch.autograd.grad(Q_a.sum(), a, create_graph=True)[0]
print('Q_a =',Q_a.detach().numpy())
print('Q_aa =',Q_aa.detach().numpy())

The output is:

Q_a = [4. 9.]
Q_aa = [4. 6.]

I am wondering that, why do I need to pass Q.sum() or Q_a.sum() which is just 1 value and the second argument a has two values.

>>> print(Q.sum())
>>> tensor(-14.3333, grad_fn=)
>>> print(a)
>>> tensor([2., 3.], requires_grad=True)

Can someone explain to me how does Q.sum() helps in computing the correct gradient. Is it possible to compute the derivatives with just Q not Q.sum()?

ayandas · Accepted Answer

Well, your question is based on wrong assumption. You said ..

I was trying to compute the derivative of Q w.r.t a

NO. you are not. In the code sample you provided, you are trying to compute the derivative of Q.sum() w.r.t a - they are different things.

"Derivative of Q w.r.t a" is matrix called Jacobian, whereas ..

"Derivative of Q.sum() w.r.t a" is a vector known as gradient.

Both can be computed and are used in different places for achieving different things. Its your decision which one you want.

How does grad() works in PyTorch?

Answers (1)

Related Questions