Reputation: 201
I am working on a NN with Pytorch which simply maps points from the plane into real numbers, for example
model = nn.Sequential(nn.Linear(2,2),nn.ReLU(),nn.Linear(2,1))
What I want to do, since this network defines a map h:R^2->R, is to compute the gradient of this mapping h in the training loop. So for example
for it in range(epochs):
pred = model(X_train)
grad = torch.autograd.grad(pred,X_train)
....
The training set has been defined as a tensor requiring the gradient. My problem is that even if the output, for each fixed point, is a scalar, since I am propagating a set of N=100 points, the output is actually a Nx1 tensor. This brings to the error: autograd can compute the gradient just of scalar functions.
In fact, trying with the little change
pred = torch.sum(model(X_train))
everything works perfectly. However I am interested in all the single gradients so, is there a way to compute all these gradients together?
Actually computing the sum as presented above gives exactly the same result I expect of course, but I wanted to know if this is the only possiblity.
Upvotes: 1
Views: 1149
Reputation: 16470
There are other possibilities but using .sum
is the simplest way. Using .sum()
on the final loss vector and computing dpred/dinput
will give you the desired output. Here is why:
Since, pred = sum(loss) = sum (f(xi))
where i
is the index of input x
.
dpred/dinput
will be a matrix [dpred/dx0, dpred/dx1, dpred/dx...]
Consider, dpred/dx0
, it will be equal to df(x0)/dx0
, since other df(xi)/dx0
is 0.
PS: Please excuse the crappy mathematical expressions... SO does not support latex/math expressions.
Upvotes: 0