How the gradient is calculated in pytorch

Question

I have an example code. When I calculate dloss/dw manually I get the result 8, but the following code gives me a 16. Please tell me how the gradient is 16.

import torch
x = torch.tensor(2.0)
y = torch.tensor(2.0)
w = torch.tensor(3.0, requires_grad=True)
# forward
y_hat = w * x
s = y_hat - y
loss = s**2
#backward
loss.backward()
print(w.grad)

Jo Bo · Accepted Answer

I think you simply miscalculated. The derivation of loss = (w * x - y) ^ 2 is:

dloss/dw = 2 * (w * x - y) * x = 2 * (3 * 2 - 2) * 2 = 16

Keep in mind that back-propagation in neural networks is done by applying the chain rule: I think you forgot the *x at the end of the derivation

To be specific: chain rule for derivation says that df(g(x))/dx = f'(g(x)) * g'(x) (derivated with respect to x)

the whole loss function in your case is built like this: loss(y_hat) = (y_hat - y)^2 y_hat(x) = w * x

thus: loss(y_hat(x)) = (y_hat(x) - y)^2 the derivation of this is according to chain rule: dloss(y_hat(x))/dw = loss'(y_hat(x)) * dy_hat(x)/dw

for any z: loss'(z) = 2 * (z - y) * 1 and dy_hat(z)/dw = z

thus: dloss((y_hat(x))/dw = dloss(y_hat(x))/dw = loss'(y_hat(x)) * y_hat'(x) = 2 * (y_hat(x) - z) * dy_hat(x)/dw = 2 * (y_hat(x) - z) * x = 2 * (w * x - z) * x = 16

pytorch knows that in your forward pass each layer applies some kind of function to its input and that your forward pass is 1 * loss(y_hat(x)) and than keeps applying the chain rule for the backward pass (each layer requires one application of the chain rule).

How the gradient is calculated in pytorch

Answers (1)

Related Questions