wwj123
wwj123

Reputation: 385

Pytorch - Should backward() function be in the loop of epoch or batch?

When training nn models using Pytorch, is there a difference regarding where we place the backward method? For example, which one of below is correct?

Calculate gradient across the batch:

for e in range(epochs):
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss_sum += loss.item()
        lstm.zero_grad()
        loss.backward()
        optimizer.step()
loss_list.append(loss_sum / num_train_obs)

Calculate gradient across the epoch:

for e in range(epochs):
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss_sum += loss.item()
    lstm.zero_grad()
    loss_sum.backward()
    optimizer.step()     
loss_list.append(loss_sum / num_train_obs)

Upvotes: 4

Views: 1733

Answers (1)

Umang Gupta
Umang Gupta

Reputation: 16450

Both are programmatically correct.

The first one is batch gradient descent, and the second one is gradient descent. In most of the problems we want to do batch gradient descent, so the first one is the right approach. It is also likely to train faster.

You may use the second approach if you want to do Gradient descent (but it is seldom desired to do GD when you can do batch GD). However, since in GD you don't clear the graph every batch (.zero_grad is called only once), you may run out-of-memory.

Upvotes: 3

Related Questions