When we define our model in PyTorch . We run through different #epochs . I want to know that in the iteration of epochs. What is the difference between the two following snippets of code in which the order is different? These two snippet versions are: I found over tutorials The code provided by my supervisor for the project. Tutorial Version for i in range(epochs): logits = model(x) loss = loss_fcn(logits,lables) loss.backward() optimizer.step() optimizer.zero_grad() Supervisor Version for i in range(epochs): logits = model(x) loss = loss_fcn(logits,lables) optimizer.zero_grad() loss.backward() optimizer.step()

Reputation: 122

Steps for Machine Learning in Pytorch

When we define our model in PyTorch. We run through different #epochs. I want to know that in the iteration of epochs. What is the difference between the two following snippets of code in which the order is different? These two snippet versions are:

I found over tutorials
The code provided by my supervisor for the project.

Tutorial Version

for i in range(epochs):
    logits = model(x)    
    loss = loss_fcn(logits,lables)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

Supervisor Version

for i in range(epochs):
    logits = model(x)
    loss = loss_fcn(logits,lables)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Upvotes: 2

Answers (3)

trsvchn

Reputation: 8981

Here is a pseudo code for the iteration:

run model
compute loss

<-- zero grads here...

go backward (compute grads if no grads otherwise accumulate)
update weights

<-- ...or here

Basically you zero grads before or after going backward and updating the weights. Both code snippets are OK.

Upvotes: 0

Ibrahim

Reputation: 71

The only difference is when the gradients are cleared. (when you call optimizer.zero_grad()) the first version zeros out the gradients after updating the weights (optimizer.step()), the second one zeroes out the gradient after updating the weights. both versions should run fine. The only difference would be the first iteration, where the second snippet is better as it makes sure the residue gradients are zero before calculating the gradients. Check this link that explains why you would zero the gradients

Upvotes: 1

Sebastian Borquez Gonzalez

Reputation: 300

In PyTorch, we typically want to explicitly set the gradients to zero for every mini-batch during the training phase before starting backpropagation (i.e., updating the Weights and biases) because PyTorch accumulates the gradients on subsequent backward passes. Regarding your question, both snippets do the same, the important detail is calling optimizer.zero_grad() before loss.backward().

Upvotes: 0

Steps for Machine Learning in Pytorch

Answers (3)

Related Questions