Prakhar Sharma
Prakhar Sharma

Reputation: 758

How does grad() works in PyTorch?

I need some conceptual clarity with the inputs of the Pytorch grad() function. Please see the following code:

import torch
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)
Q = 1/3*a**3 - 1/2*b**2

Here, I have defined 3 tensors, I was trying to compute the derivative of Q w.r.t a. The following line would simply compute the first and second derivatives.

Q_a = torch.autograd.grad(Q.sum(), a, create_graph=True)[0]
Q_aa = torch.autograd.grad(Q_a.sum(), a, create_graph=True)[0]
print('Q_a =',Q_a.detach().numpy())
print('Q_aa =',Q_aa.detach().numpy())

The output is:

Q_a = [4. 9.]
Q_aa = [4. 6.]

I am wondering that, why do I need to pass Q.sum() or Q_a.sum() which is just 1 value and the second argument a has two values.

>>> print(Q.sum())
>>> tensor(-14.3333, grad_fn=<SumBackward0>)
>>> print(a)
>>> tensor([2., 3.], requires_grad=True)

Can someone explain to me how does Q.sum() helps in computing the correct gradient. Is it possible to compute the derivatives with just Q not Q.sum()?

Upvotes: 1

Views: 2775

Answers (1)

ayandas
ayandas

Reputation: 2268

Well, your question is based on wrong assumption. You said ..

I was trying to compute the derivative of Q w.r.t a

NO. you are not. In the code sample you provided, you are trying to compute the derivative of Q.sum() w.r.t a - they are different things.

"Derivative of Q w.r.t a" is matrix called Jacobian, whereas ..

"Derivative of Q.sum() w.r.t a" is a vector known as gradient.

Both can be computed and are used in different places for achieving different things. Its your decision which one you want.

Upvotes: 4

Related Questions