Ricardo
Ricardo

Reputation: 81

keras nan loss when using custom mape loss

I built a simple lstm network and used a costom mape loss as follows:

def custom_mape(y_true, y_pred):
    mapes = K.switch(K.equal(y_true, 0), y_true, 100*K.abs(y_true - y_pred)/y_true)
    return K.mean(mapes, axis=-1)

And the loss turned to be nan at the very beginning:

Model: "sequential_93"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_163 (LSTM)              (None, 14, 1)             296       
=================================================================
Total params: 296
Trainable params: 296
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
410/410 [==============================] - 3s 7ms/step - loss: nan - val_loss: nan
Epoch 2/50
410/410 [==============================] - 2s 6ms/step - loss: nan - val_loss: nan
Epoch 3/50
410/410 [==============================] - 2s 6ms/step - loss: nan - val_loss: nan
Epoch 4/50
410/410 [==============================] - 2s 6ms/step - loss: nan - val_loss: nan
Epoch 5/50
410/410 [==============================] - 2s 6ms/step - loss: nan - val_loss: nan
Epoch 6/50
410/410 [==============================] - 2s 5ms/step - loss: nan - val_loss: nan
Epoch 7/50
410/410 [==============================] - 3s 6ms/step - loss: nan - val_loss: nan
Epoch 8/50
410/410 [==============================] - 2s 5ms/step - loss: nan - val_loss: nan
Epoch 9/50
410/410 [==============================] - 2s 5ms/step - loss: nan - val_loss: nan
Epoch 10/50
410/410 [==============================] - 2s 5ms/step - loss: nan - val_loss: nan

Here are some ways I tried:

  1. When I change K.abs(y_true - y_pred)/y_true to K.abs(y_true - y_pred), the network works.
  2. To figure out if it is gradient explosion, I tried clipvalue=1, lr=0 and batchsize =1 separately. The loss remains nan.

Besides, I used min-max normalization for y and a sample of y is as follows:

[[1.84368752e-05],
[9.86574098e-04],
[8.09853832e-04]]

Upvotes: 1

Views: 374

Answers (1)

Arda Keskiner
Arda Keskiner

Reputation: 792

K.abs(y_true - y_pred)/y_true

Here, if y_true is 0, you would get nan because you're trying to divide by 0.

Upvotes: 2

Related Questions