Reputation: 385
When training nn models using Pytorch, is there a difference regarding where we place the backward method? For example, which one of below is correct?
Calculate gradient across the batch:
for e in range(epochs):
for i in batches_list:
out = nn_model(i)
loss = loss_function(out, actual)
loss_sum += loss.item()
lstm.zero_grad()
loss.backward()
optimizer.step()
loss_list.append(loss_sum / num_train_obs)
Calculate gradient across the epoch:
for e in range(epochs):
for i in batches_list:
out = nn_model(i)
loss = loss_function(out, actual)
loss_sum += loss.item()
lstm.zero_grad()
loss_sum.backward()
optimizer.step()
loss_list.append(loss_sum / num_train_obs)
Upvotes: 4
Views: 1733
Reputation: 16450
Both are programmatically correct.
The first one is batch gradient descent, and the second one is gradient descent. In most of the problems we want to do batch gradient descent, so the first one is the right approach. It is also likely to train faster.
You may use the second approach if you want to do Gradient descent (but it is seldom desired to do GD when you can do batch GD). However, since in GD you don't clear the graph every batch (.zero_grad
is called only once), you may run out-of-memory.
Upvotes: 3