Madelon
Madelon

Reputation: 29

Custom Loss Function becomes zero when backpropagated

I am trying to write my own custom loss function that is based on the false positive and negative rates. I made a dummy code so you can check the first 2 defenitions as well. I added the rest, so you can see how it is implemented. However, still somewhere the gradient turns out to be zero. What is now the step where the gradient turns zero, or how can I check this? Please I would like to know how I can fix this :). I tried providing you with more information so you can play around as well, but if you miss anything please do let me know!

The gradient stays True during every step. However, still during the training of the model the loss is not updated, therefore the NN does not train.

y = Variable(torch.tensor((0, 0, 0, 1, 1,1), dtype=torch.float), requires_grad = True)
y_pred = Variable(torch.tensor((0.333, 0.2, 0.01, 0.99, 0.49, 0.51), dtype=torch.float), requires_grad = True)
x = Variable(torch.tensor((0, 0, 0, 1, 1,1), dtype=torch.float), requires_grad = True)
x_pred = Variable(torch.tensor((0.55, 0.25, 0.01, 0.99, 0.65, 0.51), dtype=torch.float), requires_grad = True)

def binary_y_pred(y_pred):
    y_pred.register_hook(lambda grad: print(grad))
    y_pred = y_pred+torch.tensor(0.5, requires_grad=True, dtype=torch.float)
    y_pred = y_pred.pow(5)  # this is my way working around using torch.where() 
    y_pred = y_pred.pow(10)
    y_pred = y_pred.pow(15)
    m = nn.Sigmoid()
    y_pred = m(y_pred)
    y_pred = y_pred-torch.tensor(0.5, requires_grad=True, dtype=torch.float)
    y_pred = y_pred*2
    y_pred.register_hook(lambda grad: print(grad))
    return y_pred

def confusion_matrix(y_pred, y):
    TP = torch.sum(y*y_pred)
    TN = torch.sum((1-y)*(1-y_pred))
    FP = torch.sum((1-y)*y_pred)
    FN = torch.sum(y*(1-y_pred))

    k_eps = torch.tensor(1e-12, requires_grad=True, dtype=torch.float)
    FN_rate = FN/(TP + FN + k_eps)
    FP_rate = FP/(TN + FP + k_eps)

    return FN_rate, FP_rate

def dif_rate(FN_rate_y, FN_rate_x):
    dif = (FN_rate_y - FN_rate_x).pow(2)
    return dif

def custom_loss_function(y_pred, y, x_pred, x):
    y_pred = binary_y_pred(y_pred)
    FN_rate_y, FP_rate_y = confusion_matrix(y_pred, y)

    x_pred= binary_y_pred(x_pred)
    FN_rate_x, FP_rate_x = confusion_matrix(x_pred, x)

    FN_dif = dif_rate(FN_rate_y, FN_rate_x)
    FP_dif = dif_rate(FP_rate_y, FP_rate_x)

    cost = FN_dif+FP_dif
    return cost

# I added the rest so you can see how it is implemented, but this peace does not fully run well! If you want this part to run as well, I can add more code.
class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim) 
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, output_dim)
        self.sigmoid = nn.Sigmoid()

     def forward(self, x):
        out = self.fc1(x)
        out = self.relu1(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        return out

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

optimizer = torch.optim.Adam(model.parameters(), lr=0.0001, betas=[0.9, 0.99], amsgrad=True)
criterion = torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')
for epoch in range(num_epochs):

train_err = 0
for i, (samples, truths) in enumerate(train_loader):
    samples = Variable(samples)
    truths = Variable(truths)
    optimizer.zero_grad()   # Reset gradients
    outputs = model(samples)  # Do the forward pass
    loss2 = criterion(outputs, truths) # Calculate loss

    samples_y = Variable(samples_y)
    samples_x = Variable(samples_x)

    y_pred = model(samples_y)
    y = Variable(y, requires_grad=True)

    x_pred = model(samples_x)
    x= Variable(x, requires_grad=True)

    cost = custom_loss_function(y_pred, y, x_pred, x)
    loss = loss2*0+cost #checking only if cost works.
    loss.backward()                  
    optimizer.step()
    train_err += loss.item()
train_loss.append(train_err)

I expect the model to update during training. There is no error message.

Upvotes: 0

Views: 116

Answers (1)

Jan
Jan

Reputation: 1240

With your definitions:TP+FN=y and TN+FP=1-y. Then you'll get FN_rate=1-y_pred and FP_rate=y_pred. Your cost is then FN_rate+FP_rate=1, the gradient of which is 0.

You can check this by hand or using a library for symbolic mathematics (e.g., SymPy):

from sympy import symbols

y, y_pred = symbols("y y_pred")

TP = y * y_pred
TN = (1-y)*(1-y_pred)
FP = (1-y)*y_pred
FN = y*(1-y_pred)

# let's ignore the eps for now
FN_rate = FN/(TP + FN)
FP_rate = FP/(TN + FP)
cost = FN_rate + FP_rate

from sympy import simplify
print(simplify(cost))
# output: 1

Upvotes: 1

Related Questions