Reputation: 15
The code is as below about pytorch, it's about derivative, I think the output is 18 but it's 4.5, I don'k know why:
import torch
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
out.backward()
print(x.grad)
The output: tensor([[4.5000, 4.5000], [4.5000, 4.5000]])
I think the derivative is 2*3*(1+2), so it should be:
tensor([[18, 18],
[18, 18]])
Why the output is 4.5? Some people think it's the mean method that make the derivative /4, but when I do the code "print(out)", the output is "tensor(27., grad_fn=)" rather than (4.5., grad_fn=), I'm a new learning of pytorch, so I don't know what it does with "tensor.mean()", but since the output of "print(out)" is 27, so I don't think there is a "/4" process in "tensor.mean()", so I don't think it should include "/4" process in the derivative computation, is that correct?(Please help me~)
Upvotes: 1
Views: 139
Reputation: 1620
Here's how I think it goes:
y = x + 2
and z = y * y * 3
,
so z is 3 * (x+2)^2
Next, out = z.mean()
, or sigma z / n
which is sigma z/4
since we had a total of 4 numbers in z.
Thus, you find the derivative of sigma (3 * (x + 2)^2)/4
at x = 1.
That gives (3/4) * 2(x + 2)
at x = 1, which is 4.5
So I think you had it all figured out, except that in the last step you missed dividing by 4, which you need to do since there's a mean() function in there.
Edit:
Since you're confused with how the mean() is affecting the output, let's do a tensor of values [1,2,3,4] instead of torch.ones()
to see the effect.
x = torch.tensor([1.0,2.0,3.0,4.0], requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
out.backward()
print(x.grad)
this will output
tensor([4.5000, 6.0000, 7.5000, 9.0000])
How? Remember we derived the equation for our derivative to be:
(3/4) * 2(x + 2)
Now you substitute x to be 1, and get 4.500.
Then for x = 2, you get 6.000, for x = 3 you get 7.500 and so on.
In the earlier example, you had four instances of x = 1 which is why you had x.grad to be [[4.5, 4.5], [4.5, 4.5]]
Upvotes: 2