tahsin314
tahsin314

Reputation: 519

PyTorch: Loss remains constant

I've written a code in PyTorch with my own implemented loss function focal_loss_fixed. But my loss value stays fixed after every epoch. Looks like weights are not being updated. Here is my code snippet:

optimizer = optim.SGD(net.parameters(),
                          lr=lr,
                          momentum=0.9,
                          weight_decay=0.0005)


for epoch in T(range(20)):
    net.train()
    epoch_loss = 0
    for n in range(len(x_train)//batch_size):
        (imgs, true_masks) = data_gen_small(x_train, y_train, iter_num=n, batch_size=batch_size)
        temp = []
        for tt in true_masks:
            temp.append(tt.reshape(128, 128, 1))
        true_masks = np.copy(np.array(temp))
        del temp
        imgs = np.swapaxes(imgs, 1,3)
        imgs = torch.from_numpy(imgs).float().cuda()
        true_masks = torch.from_numpy(true_masks).float().cuda()
        masks_pred = net(imgs)
        masks_probs = F.sigmoid(masks_pred)
        masks_probs_flat = masks_probs.view(-1)
        true_masks_flat = true_masks.view(-1)
        print((focal_loss_fixed(tf.convert_to_tensor(true_masks_flat.data.cpu().numpy()), tf.convert_to_tensor(masks_probs_flat.data.cpu().numpy()))))
        loss = torch.from_numpy(np.array(focal_loss_fixed(tf.convert_to_tensor(true_masks_flat.data.cpu().numpy()), tf.convert_to_tensor(masks_probs_flat.data.cpu().numpy())))).float().cuda()
        loss = Variable(loss.data, requires_grad=True)
        epoch_loss *= (n/(n+1))
        epoch_loss += loss.item()*(1/(n+1))
        print('Step: {0:.2f}% --- loss: {1:.6f}'.format(n * batch_size* 100.0 / len(x_train), epoch_loss), end='\r')
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print('Epoch finished ! Loss: {}'.format(epoch_loss))

And this is my `focal_loss_fixed' function:

def focal_loss_fixed(true_data, pred_data):
    gamma=2.
    alpha=.25
    eps = 1e-7
    # print(type(y_true), type(y_pred))
    pred_data = K.clip(pred_data,eps,1-eps)
    pt_1 = tf.where(tf.equal(true_data, 1), pred_data, tf.ones_like(pred_data))
    pt_0 = tf.where(tf.equal(true_data, 0), pred_data, tf.zeros_like(pred_data))
    with tf.Session() as sess:
        return sess.run(-K.sum(alpha * K.pow(1. - pt_1, gamma) * K.log(pt_1))-K.sum((1-alpha) * K.pow( pt_0, gamma) * K.log(1. - pt_0)))

After each epoch the loss value stays constant(5589.60328). What's wrong with it?

Upvotes: 0

Views: 1913

Answers (2)

dennlinger
dennlinger

Reputation: 11420

I think the problem lies in your heavy weight decay.

Essentially, you are not reducing the weight by x, but rather you multiply the weights by x, which means that you are instantaneously only doing very small increments, leading to a (seemingly) plateauing loss function.

More explanation on this can be found in the PyTorch discussion forum (e.g., here, or here).
Unfortunately, the source for SGD alone also does not tell you much about its implementation. Simply setting it to a larger value should result in better updates. You can start by leaving it out completely, and then iteratively reducing it (from 1.0), until you get more decent results.

Upvotes: 1

Daniel
Daniel

Reputation: 11

When computing the loss you call focal_loss_fixed() which uses TensorFlow to compute the loss value. focal_loss_fixed() creates a graph and runs it in a session to get the value, and by this point PyTorch has no idea of the sequence of operations that led to the loss because they were computed by the TensorFlow backend. It is likely then, that all PyTorch sees in loss is a constant, as if you had written

loss = 3

So the gradient will be zero, and the parameters will never be updated. I suggest you rewrite your loss function using PyTorch operations so that the gradient with respect to its inputs can be computed.

Upvotes: 1

Related Questions