Why does the same PyTorch code (different implementation) give different loss?

Question

I was tackling the Fashion MNIST data-set problem on Udacity. However my implementation of code is giving drastically different loss as compared to the solution shared by the Udacity team. I believe the only difference in my answer is the definition of the Neural Network and apart from that everything is the same. I am not able to figure out the reason for such a drastic difference in Loss.

Code 1: My solution:

import torch.nn as nn
from torch import optim

images, labels = next(iter(trainloader))
model = nn.Sequential(nn.Linear(784,256),
                 nn.ReLU(),
                 nn.Linear(256,128),
                 nn.ReLU(),
                 nn.Linear(128,64),
                 nn.ReLU(),
                 nn.Linear(64,10),
                 nn.LogSoftmax(dim=1))
# Flatten images
optimizer = optim.Adam(model.parameters(),lr=0.003)

criterion = nn.NLLLoss()

for i in range(10):
    running_loss = 0
    for images,labels in trainloader:
        images = images.view(images.shape[0], -1)
    
        output = model.forward(images)
        loss = criterion(output,labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        print(f"Training loss: {running_loss}")

# Loss is coming around 4000

Code 2: Official Solution:

from torch import nn, optim
import torch.nn.functional as F

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)
    
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
     
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.log_softmax(self.fc4(x), dim=1)
    
        return x

model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)
epochs = 5

for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        log_ps = model(images)
        loss = criterion(log_ps, labels)
    
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
        running_loss += loss.item()
    else:
        print(f"Training loss: {running_loss}")
# Loss is coming around 200

Is there any explanation for the vast difference in loss ?

Hossein · Accepted Answer

You forgot to zero out/clear the gradients in your implementation. That is you are missing :

optimizer.zero_grad()

In other words simply do:

for i in range(10):
    running_loss = 0
    for images,labels in trainloader:
        images = images.view(images.shape[0], -1)
    
        output = model.forward(images)
        loss = criterion(output,labels)
        # missed this! 
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        print(f"Training loss: {running_loss}")

and you are good to go!

Why does the same PyTorch code (different implementation) give different loss?

Answers (1)

Related Questions