Why torch.sum() before doing .backward()?

Question

I can see what this code below from this video is trying to do. But the sum from y=torch.sum(x**2) confuses me. With sum operation, y becomes a tensor with one single value. As I understand .backward() as calculating derivatives, why would we want to use sum and reduce y to one value?

import pytorch
import matplotlib.pyplot as plt 
x = torch.linspace(-10.0,10.0,10, requires_grad=True)
Y = x**2
y = torch.sum(x**2)     
y.backward()

plt.plot(x.detach().numpy(), Y.detach().numpy(), label="Y")
plt.plot(x.detach().numpy(), x.grad.detach().numpy(), label="derivatives")
plt.legend()

Shai · Accepted Answer

You can only compute partial derivatives for a scalar function. What backwards() gives you is d loss/d parameter and you expect a single gradient value per parameter/variable.
Had your loss function been a vector function, i.e., mapping from multiple inputs to multiple outputs, you would have ended up with multiple gradients per parameter/variable.

Please see this answer for more information.

Why torch.sum() before doing .backward()?

Answers (2)

Related Questions