Can't backward pass two losses in Classification Transformer Model

Question

For my model I'm using a roberta transformer model and the Trainer from the Huggingface transformer library.

I calculate two losses: lloss is a Cross Entropy Loss and dloss calculates the loss inbetween hierarchy layers.

The total loss is the sum of lloss and dloss. (Based on this)

When calling total_loss.backwards() however, I get the error:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed

Any idea why that happens? Can I force it to only call backwards once? Here is the loss calculation part:

dloss = calculate_dloss(prediction, labels, 3)
lloss = calculate_lloss(predeiction, labels, 3)
total_loss = lloss + dloss 
total_loss.backward()

def calculate_lloss(predictions, true_labels, total_level):
    '''Calculates the layer loss.
    '''

    loss_fct = nn.CrossEntropyLoss()

    lloss = 0
    for l in range(total_level):

        lloss += loss_fct(predictions[l], true_labels[l])

    return self.alpha * lloss

def calculate_dloss(predictions, true_labels, total_level):
    '''Calculate the dependence loss.
    '''

    dloss = 0
    for l in range(1, total_level):

        current_lvl_pred = torch.argmax(nn.Softmax(dim=1)(predictions[l]), dim=1)
        prev_lvl_pred = torch.argmax(nn.Softmax(dim=1)(predictions[l-1]), dim=1)

        D_l = self.check_hierarchy(current_lvl_pred, prev_lvl_pred, l)  #just a boolean tensor

        l_prev = torch.where(prev_lvl_pred == true_labels[l-1], torch.FloatTensor([0]).to(self.device), torch.FloatTensor([1]).to(self.device))
        l_curr = torch.where(current_lvl_pred == true_labels[l], torch.FloatTensor([0]).to(self.device), torch.FloatTensor([1]).to(self.device))

        dloss += torch.sum(torch.pow(self.p_loss, D_l*l_prev)*torch.pow(self.p_loss, D_l*l_curr) - 1)

    return self.beta * dloss

FlyingTeller · Accepted Answer

There is nothing wrong with having a loss that is the sum of two individual losses, here is a small proof of principle adapted from the docs:

import torch
import numpy
from sklearn.datasets import make_blobs

class Feedforward(torch.nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Feedforward, self).__init__()
        self.input_size = input_size
        self.hidden_size  = hidden_size
        self.fc1 = torch.nn.Linear(self.input_size, self.hidden_size)
        self.relu = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(self.hidden_size, 1)
        self.sigmoid = torch.nn.Sigmoid()
    def forward(self, x):
        hidden = self.fc1(x)
        relu = self.relu(hidden)
        output = self.fc2(relu)
        output = self.sigmoid(output)
        return output

def blob_label(y, label, loc): # assign labels
    target = numpy.copy(y)
    for l in loc:
        target[y == l] = label
    return target

x_train, y_train = make_blobs(n_samples=40, n_features=2, cluster_std=1.5, shuffle=True)
x_train = torch.FloatTensor(x_train)
y_train = torch.FloatTensor(blob_label(y_train, 0, [0]))
y_train = torch.FloatTensor(blob_label(y_train, 1, [1,2,3]))

x_test, y_test = make_blobs(n_samples=10, n_features=2, cluster_std=1.5, shuffle=True)
x_test = torch.FloatTensor(x_test)
y_test = torch.FloatTensor(blob_label(y_test, 0, [0]))
y_test = torch.FloatTensor(blob_label(y_test, 1, [1,2,3]))


model = Feedforward(2, 10)
criterion = torch.nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)


model.eval()
y_pred = model(x_test)
before_train = criterion(y_pred.squeeze(), y_test)
print('Test loss before training' , before_train.item())

model.train()
epoch = 20
for epoch in range(epoch):
    optimizer.zero_grad()    # Forward pass
    y_pred = model(x_train)    # Compute Loss
    lossCE= criterion(y_pred.squeeze(), y_train)
    lossSQD = (y_pred.squeeze()-y_train).pow(2).mean()
    loss=lossCE+lossSQD
    print('Epoch {}: train loss: {}'.format(epoch, loss.item()))    # Backward pass
    loss.backward()
    optimizer.step()

There must be a real second time that you call directly or indirectly backward on some varaible that then traverses through your graph. It is a bit too much to ask for the complete code here, only you can check this or at least reduce it to a minimal example (while doing so, you might already find the issue). Apart from that, I would start checking:

Does it already occur in the first iteration of training? If not: are you reusing any calculation results for the second iteration without a detach?
When you do backward on your losses individually lloss.backward() followed by dloss.backward() (this has the same effect as adding them together first as gradients are accumulated): what happens? This will let you track down for which of the two losses the error occurs.

Can't backward pass two losses in Classification Transformer Model

Answers (2)

Related Questions

Can&#39;t backward pass two losses in Classification Transformer Model

Answers (2)

Related Questions

Can't backward pass two losses in Classification Transformer Model