'loss: nan' during training of Neural Network in Keras

Question

I am training a neural net in Keras. During training of the first epoch the loss value returns and then suddenly goes loss: nan before the first epoch ends, significantly dropping the accuracy. Then starting the second epoch the loss: nan continues but the accuracy is 0. This goes on for the rest of the epochs.

The frustrating bit is that there seems to be no consistency in the output for each time I train. As to say, the loss: nan shows up at different points in the first epoch.

There have been a couple of questions on this website that give "guides" to problems similar to this I just haven't seen one done so explicitly in keras. I am trying to get my neural network to classify a 1 or a 0.

Here are some things I have done, post-ceding this will be my output and code.

Standardization // Normalization

I posted a question about my data here. I was able to figure it out and perform sklearn's StandardScaler() and MinMaxScaler() on my dataset. Both standardization and normalization methods did not help my issue.

Learning Rate

The optimizers I have tried are adam and SGD. In both cases I tried lowering the standard learning rate to see if that would help and in both cases. Same issue arose.

Activations

I thought that it was pretty standard to use relu but I saw on the internet somewhere someone talking about using tanh, tried it, no dice.

Batch Size

Tried 32, 50, 128, 200. 50 got me the farthest into the 1st epoch, everything else didn't help.

Combating Overfitting

Put a dropout layer in and tried a whole bunch of numbers.

Other Observations

The epochs train really really fast for the dimensions of the data (I could be wrong).
loss: nan could have something to do with my loss function being binary_crossentropy and maybe some values are giving that loss function a hard time.
kernel_initializer='uniform' has been untouched and unconsidered in my quest to figure this out.
The internet also told me that there could be a nan value in my data but I think that was for an error that broke their script.

from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
X_train_total_scale = sc.fit_transform((X_train))
X_test_total_scale = sc.transform((X_test))

print(X_train_total_scale.shape) #(4140, 2756)
print(y_train.shape) #(4140,)


##NN
#adam = keras.optimizers.Adam(lr= 0.0001)
sgd = optimizers.SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)
classifier = Sequential()
classifier.add(Dense(output_dim = 1379, kernel_initializer='uniform', activation='relu', input_dim=2756))
classifier.add(Dropout(0.6))
classifier.add(Dense(output_dim = 1379, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(output_dim = 1, kernel_initializer='uniform', activation='sigmoid'))

classifier.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])

classifier.fit(X_train_total_scale, y_train, validation_data=(X_test_total_scale, y_test), batch_size=50, epochs=100)

(batch size 200 shown to avoid too-big-a text block)

200/4140 [>.............................] - ETA: 7s - loss: 0.6866 - acc: 0.5400
 400/4140 [=>............................] - ETA: 4s - loss: 0.6912 - acc: 0.5300
 600/4140 [===>..........................] - ETA: 2s - loss: nan - acc: 0.5300   
 800/4140 [====>.........................] - ETA: 2s - loss: nan - acc: 0.3975
1000/4140 [======>.......................] - ETA: 1s - loss: nan - acc: 0.3180
1200/4140 [=======>......................] - ETA: 1s - loss: nan - acc: 0.2650
1400/4140 [=========>....................] - ETA: 1s - loss: nan - acc: 0.2271
1600/4140 [==========>...................] - ETA: 1s - loss: nan - acc: 0.1987
1800/4140 [============>.................] - ETA: 1s - loss: nan - acc: 0.1767
2000/4140 [=============>................] - ETA: 0s - loss: nan - acc: 0.1590
2200/4140 [==============>...............] - ETA: 0s - loss: nan - acc: 0.1445
2400/4140 [================>.............] - ETA: 0s - loss: nan - acc: 0.1325
2600/4140 [=================>............] - ETA: 0s - loss: nan - acc: 0.1223
2800/4140 [===================>..........] - ETA: 0s - loss: nan - acc: 0.1136
3000/4140 [====================>.........] - ETA: 0s - loss: nan - acc: 0.1060
3200/4140 [======================>.......] - ETA: 0s - loss: nan - acc: 0.0994
3400/4140 [=======================>......] - ETA: 0s - loss: nan - acc: 0.0935
3600/4140 [=========================>....] - ETA: 0s - loss: nan - acc: 0.0883
3800/4140 [==========================>...] - ETA: 0s - loss: nan - acc: 0.0837
4000/4140 [===========================>..] - ETA: 0s - loss: nan - acc: 0.0795
4140/4140 [==============================] - 2s 368us/step - loss: nan - acc: 0.0768 - val_loss: nan - val_acc: 0.0000e+00
Epoch 2/100

 200/4140 [>.............................] - ETA: 1s - loss: nan - acc: 0.0000e+00
 400/4140 [=>............................] - ETA: 0s - loss: nan - acc: 0.0000e+00
 600/4140 [===>..........................] - ETA: 0s - loss: nan - acc: 0.0000e+00
 800/4140 [====>.........................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1000/4140 [======>.......................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1200/4140 [=======>......................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1400/4140 [=========>....................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1600/4140 [==========>...................] - ETA: 0s - loss: nan - acc: 0.0000e+00


... and so on...

I hope to be able to get a full training done (duh) but I would also like to learn about some of the intuition people have to figure out these problems on their own!

joek47 · Accepted Answer

Firstly, check for NaNs or inf in your dataset.

You could try different optimizers, e.g. rmsprop. Learning rate could be smaller, though I haven't used anything lower than 0.0001 (which is what you're using) myself.

I thought that it was pretty standard to use relu but I saw on the internet somewhere someone talking about using tanh, tried it, no dice

Try leaky relu, elu if you're concerned about this.

'loss: nan' during training of Neural Network in Keras

Standardization // Normalization

Learning Rate

Activations

Batch Size

Combating Overfitting

Other Observations

Answers (1)

Related Questions

&#39;loss: nan&#39; during training of Neural Network in Keras

Standardization // Normalization

Learning Rate

Activations

Batch Size

Combating Overfitting

Other Observations

Answers (1)

Related Questions

'loss: nan' during training of Neural Network in Keras