Prithviraj Kanaujia
Prithviraj Kanaujia

Reputation: 361

how does loss.backward() calculated in pytorch lightning

I understand that when training_step() returns loss, the code for automatic optimization (link) takes care of the loss.backward()

Can someone tell me what would be difference in the loss.backward() automatic optimization, for the following two scenarios for training_step():

Scenario 1:

def training_step(self, batch: list,epochidx):
     x,y = batch
     output = model(x)
     loss = self.loss_func(output,y)

     return loss

Scenario 2:

def training_step(self, batch: list,epochidx):
     x,y = batch
     output = model(x)
     loss = self.loss_func(output,y)
     metric = self.metric(output,y)

     train_log = {"loss":loss,"metric":metric}

     return train_log

What my worry is that loss.backward() in the 2nd scenario does backward for both loss and metric instead of just loss.

I opened the pytorch-lightning files in my conda environment to understand how the automatic optimization is happening if I send a dictionary instead of a Tensor but it didn't lead to much.

Any help/hint is appreciated. Thanks!

Upvotes: 0

Views: 669

Answers (1)

Johannes R.
Johannes R.

Reputation: 16

When you call loss.backward() it will go backwards through the graph over all parameters and calculate the gradients for each trainable parameter and accumulate it in parameter.grad (but not change the parameter itself. That is done with optimizer.step()). So unless you call metric.backward(), metric will not affect the gradients. In fact you can calculate as many metrics as you want this way without affecting the gradients. This answer might also be helpful: https://discuss.pytorch.org/t/what-does-the-backward-function-do/9944/2

Upvotes: 0

Related Questions