Pytorch - Should backward() function be in the loop of epoch or batch?

Question

When training nn models using Pytorch, is there a difference regarding where we place the backward method? For example, which one of below is correct?

Calculate gradient across the batch:

for e in range(epochs):
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss_sum += loss.item()
        lstm.zero_grad()
        loss.backward()
        optimizer.step()
loss_list.append(loss_sum / num_train_obs)

Calculate gradient across the epoch:

for e in range(epochs):
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss_sum += loss.item()
    lstm.zero_grad()
    loss_sum.backward()
    optimizer.step()     
loss_list.append(loss_sum / num_train_obs)

Umang Gupta · Accepted Answer

Both are programmatically correct.

The first one is batch gradient descent, and the second one is gradient descent. In most of the problems we want to do batch gradient descent, so the first one is the right approach. It is also likely to train faster.

You may use the second approach if you want to do Gradient descent (but it is seldom desired to do GD when you can do batch GD). However, since in GD you don't clear the graph every batch (.zero_grad is called only once), you may run out-of-memory.

Pytorch - Should backward() function be in the loop of epoch or batch?

Answers (1)

Related Questions