Reputation: 175
import torch
import torchvision.models as models
model = models.resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
x = torch.randn(1, 3, 224, 224)
y = torch.randn(1, 3, 224, 224)
#1st Approach
loss1 = model(x).mean()
loss2 = model(y).mean()
(loss1+loss2).backward()
optimizer.step()
I want to forward two datasets and their total loss will be used for backward and update one model. Is this approach correct?
#2nd Approach
loss1 = model(x).mean()
loss1.backward()
loss2 = model(y).mean()
loss2.backward()
optimizer.step()
And what is the difference between the first and second approachs ?
Upvotes: 2
Views: 1794
Reputation: 11638
Both of them are actually equivalent: The gradient gets acccumulated additively in the backpropagation (which is a convenient implementation for nodes that appear multiple times in the computation graph). So both of them are pretty much identical.
But to make the code readable and really make obvious what is happening, I would prefer the first approach. The second method (as described above) is basically "abusing" that effect of accumulating gradients - it is not actually abuse but it is way more common, and as I said in my opinion way easier to read to use the first way.
Upvotes: 5