wwj123
wwj123

Reputation: 385

Pytorch - Should backward() function be in the loop of epoch or batch?

When training nn models using Pytorch, is there a difference regarding where we place the backward method? For example, which one of below is correct?

Calculate gradient across the batch:

for e in range(epochs):
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss_sum += loss.item()
        lstm.zero_grad()
        loss.backward()
        optimizer.step()
loss_list.append(loss_sum / num_train_obs)

Calculate gradient across the epoch:

for e in range(epochs):
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss_sum += loss.item()
    lstm.zero_grad()
    loss_sum.backward()
    optimizer.step()     
loss_list.append(loss_sum / num_train_obs)

Upvotes: 4

Views: 1757

Answers (1)

Umang Gupta
Umang Gupta

Reputation: 16500

Both are programmatically correct.

The first one is batch gradient descent, and the second one is gradient descent. In most of the problems we want to do batch gradient descent, so the first one is the right approach. It is also likely to train faster.

You may use the second approach if you want to do Gradient descent (but it is seldom desired to do GD when you can do batch GD). However, since in GD you don't clear the graph every batch (.zero_grad is called only once), you may run out-of-memory.

Upvotes: 3

Related Questions