Muhammad Fhadli
Muhammad Fhadli

Reputation: 361

How pytorch loss connect to model parameters?

I know that in PyTorch optimizer is connected to the model's parameters by

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

and inside the training loop we have to do the backward and update the gradient by execute this two lines

loss.backward()
optimizer.step()

But how does the loss actually connect to the model parameters? because we only define connection between optimizer and model and never define the connection between loss and the model.

And when we execute loss.backward(), how does PyTorch know that we will do backpropagation for our model?

I put the full code here for the context

import torch
import torch.nn as nn

X = torch.tensor([[1], [2], [3], [4]], dtype=torch.float32)
Y = torch.tensor([[2], [4], [6], [8]], dtype=torch.float32)
X_test = torch.tensor([[5]], dtype=torch.float32)

n_sample, n_feature = X.shape
input_size = n_feature
output_size = n_feature

model = nn.Linear(input_size, output_size)

# Training
learning_rate = 0.01
n_iters = 100

loss = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

# print(model(X_test))
print(f"Prediction before training f(5) = {model(X_test).item():.3f}")

for epoch in range(n_iters):
  y_pred = model(X)

  # compute loss
  l = loss(Y, y_pred)

  # gradient
  l.backward()

  # update gradient
  optimizer.step()

  # zero gradient
  optimizer.zero_grad()

  if epoch % 10 == 0:
    w, b = model.parameters()
    # print(model.parameters())
    print(f"Epoch {epoch + 1}, w = {w[0][0].item():.3f}, loss = {l:.5f}")

print(f"Prediction after training f(5) = {model(X_test).item():.3f}")

Upvotes: 3

Views: 1161

Answers (1)

Scriddie
Scriddie

Reputation: 3821

Q: When we execute loss.backward(), how does PyTorch know that we will do backpropagation for our model?

In the line l = loss(Y, y_pred), the predictions are used to calculate the loss. This effectively connects the model parameters with the loss such that loss.backward() can do the backpropagation for the network to compute the parameter gradients. Note that the tensors in model() have requires_grad=True, while this is not the case for the labels which do not need gradients. Through l.backward(), each tensor value that went into the loss calculation and requires a gradient (in our case that's the model parameters) is assigned a gradient. See the documentation for the grad attribute.

Q: But how does the loss actually connect to the model parameters?

The statement optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) connects optimizer and model parameters. Since the gradients computed through loss.backward() become attributes of the model parameters, they are accessible to the optimizer.

Upvotes: 4

Related Questions