Reputation: 1964
In the documentation of torch.autograd.grad
, it is stated that, for parameters,
parameters:
outputs (sequence of Tensor) – outputs of the differentiated function.
inputs (sequence of Tensor) – Inputs w.r.t. which the gradient will be returned (and not accumulated into .grad).
I try the following:
a = torch.rand(2, requires_grad=True)
b = torch.rand(2, requires_grad=True)
c = a+b
d = a-b
torch.autograd.grad([c, d], [a, b]) #ValueError: only one element tensors can be converted to Python scalars
torch.autograd.grad(torch.tensor([c, d]), torch.tensor([a, b])) #RuntimeError: grad can be implicitly created only for scalar outputs
I would like to get gradients of a list of tensors w.r.t another list of tensors. What is the correct way to feed the parameters?
Upvotes: 6
Views: 8304
Reputation: 4475
As the torch.autograd.grad mentioned, torch.autograd.grad
computes and returns the sum of gradients of outputs w.r.t. the inputs. Since your c
and d
are not scalar values, grad_outputs
are required.
import torch
a = torch.rand(2,requires_grad=True)
b = torch.rand(2, requires_grad=True)
a
# tensor([0.2308, 0.2388], requires_grad=True)
b
# tensor([0.6314, 0.7867], requires_grad=True)
c = a*a + b*b
d = 2*a+4*b
torch.autograd.grad([c,d], inputs=[a,b], grad_outputs=[torch.Tensor([1.,1.]), torch.Tensor([1.,1.])])
# (tensor([2.4616, 2.4776]), tensor([5.2628, 5.5734]))
Explanation:
dc/da = 2*a = [0.2308*2, 0.2388*2]
dd/da = [2.,2.]
So the first output is dc/da*grad_outputs[0]+dd/da*grad_outputs[1] = [2.4616, 2.4776]
. Same calculation for the second output.
If you just want to get the gradient of c
and d
w.r.t. the inputs, probably you can do this:
a = torch.rand(2,requires_grad=True)
b = torch.rand(2, requires_grad=True)
a
# tensor([0.9566, 0.6066], requires_grad=True)
b
# tensor([0.5248, 0.4833], requires_grad=True)
c = a*a + b*b
d = 2*a+4*b
[torch.autograd.grad(t, inputs=[a,b], grad_outputs=[torch.Tensor([1.,1.])]) for t in [c,d]]
# [(tensor([1.9133, 1.2132]), tensor([1.0496, 0.9666])),
# (tensor([2., 2.]), tensor([4., 4.]))]
Upvotes: 3
Reputation: 143
Here you go In the example you gave:
a = torch.rand(2, requires_grad=True)
b = torch.rand(2, requires_grad=True)
loss = a + b
As the loss is a vector with 2 elements, you can't perform the autograd operation at once.
typically,
loss = torch.sum(a + b)
torch.autograd.grad([loss], [a, b])
This would return the correct value of gradient for the loss tensor which contains one element.
You can pass mutiple scalar tensors to outputs argument of the torch.autograd.grad
method
Upvotes: 1