Peyman habibi
Peyman habibi

Reputation: 810

What is the right calculation of epoch loss in training?

I am reading Pytorch official tutorial for fine tuning and I am faced with one problem and that is calculation of loss in each epoch.

Before this , I calculate loss for batch of data, accumulate these batch losses and find mean of these values as loss of epoch. But in that example, the calculation is as follow:

for inputs, labels in dataloaders[phase]:
            inputs = inputs.to(device)
            labels = labels.to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward
            # track history if only in train
            with torch.set_grad_enabled(phase == 'train'):
                outputs = model(inputs)
                _, preds = torch.max(outputs, 1)
                loss = criterion(outputs, labels)

                # backward + optimize only if in training phase
                if phase == 'train':
                    loss.backward()
                    optimizer.step()

            # statistics
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)

my question is in this line running_loss += loss.item() * inputs.size(0). It is multiply loss value of batch in bach size. What is the true way to calculate loss of epoch?

and what is the unit of loss? What is the range of loss value?

Upvotes: 2

Views: 2502

Answers (1)

Alperen Kantarcı
Alperen Kantarcı

Reputation: 1098

Yes the code snippet adds multiplication of batch size with batch mean error. If you want to calculate true summation. You can use

torch.nn.CrossEntropyLoss(reduction = "sum")

which will give you the sum of errors for the batch. Then you can directly sum for each batch as follows:

running_loss += loss.item()

The range of the loss value depends on your number of classes and feature vector. The code in your question will have same running_loss if you use reduction="sum" because your code basically makes

(loss/batch_size) * batch_size

which is the same thing with loss value. However, backpropagation changes because on the one hand you backprop according to the sum of losses, on the other hand you calculate backprop according to the mean loss.

Upvotes: 1

Related Questions