Usama Ahmed
Usama Ahmed

Reputation: 66

How the gradient is calculated in pytorch

I have an example code. When I calculate dloss/dw manually I get the result 8, but the following code gives me a 16. Please tell me how the gradient is 16.

import torch
x = torch.tensor(2.0)
y = torch.tensor(2.0)
w = torch.tensor(3.0, requires_grad=True)
# forward
y_hat = w * x
s = y_hat - y
loss = s**2
#backward
loss.backward()
print(w.grad)

Upvotes: 0

Views: 1791

Answers (1)

Jo Bo
Jo Bo

Reputation: 156

I think you simply miscalculated. The derivation of loss = (w * x - y) ^ 2 is:

dloss/dw = 2 * (w * x - y) * x = 2 * (3 * 2 - 2) * 2 = 16

Keep in mind that back-propagation in neural networks is done by applying the chain rule: I think you forgot the *x at the end of the derivation

To be specific: chain rule for derivation says that df(g(x))/dx = f'(g(x)) * g'(x) (derivated with respect to x)

the whole loss function in your case is built like this: loss(y_hat) = (y_hat - y)^2 y_hat(x) = w * x

thus: loss(y_hat(x)) = (y_hat(x) - y)^2 the derivation of this is according to chain rule: dloss(y_hat(x))/dw = loss'(y_hat(x)) * dy_hat(x)/dw

for any z: loss'(z) = 2 * (z - y) * 1 and dy_hat(z)/dw = z

thus: dloss((y_hat(x))/dw = dloss(y_hat(x))/dw = loss'(y_hat(x)) * y_hat'(x) = 2 * (y_hat(x) - z) * dy_hat(x)/dw = 2 * (y_hat(x) - z) * x = 2 * (w * x - z) * x = 16

pytorch knows that in your forward pass each layer applies some kind of function to its input and that your forward pass is 1 * loss(y_hat(x)) and than keeps applying the chain rule for the backward pass (each layer requires one application of the chain rule).

Upvotes: 1

Related Questions