Loss is NAN using Keras on the MNIST digit set

Question

I am following an example from a data science textbook and have run into an issue where I am getting NaN values for the loss when running simple Keras neural networks to find the optimal learning rate.

# Get data and split into test/train/valid and normalize
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()
X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.


# Callback to grow the learning rate at each iteration.
# Also record learning rate and loss at each iteration.
K = keras.backend
class ExponentialLearningRate(keras.callbacks.Callback):
    def __init__(self, factor):
        self.factor = factor
        self.rates = []
        self.losses = []
    def on_batch_end(self, batch, logs):
        self.rates.append(K.get_value(self.model.optimizer.lr))
        self.losses.append(logs["loss"])
        K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)

# Define the model and compile/fit.
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(300, activation="relu"),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.SGD(lr=1e-3),
              metrics=["accuracy"])
expon_lr = ExponentialLearningRate(factor=1.005)

history = model.fit(X_train, y_train, epochs=1,
                    validation_data=(X_valid, y_valid),
                    callbacks=[expon_lr])

Running this gives an output of:

1719/1719 [==============================] - 6s 4ms/step - loss: nan - accuracy: 0.6030 - val_loss: nan - val_accuracy: 0.0958

Plotting the loss vs learning rate gives (top is my result, bottom is the expected result from the example I am following):

Notably, the example loss is much noisier than mine and ranges from ~2.5 to ~0.25. My loss only ranges from ~2.5 to exactly 1, at which point the loss goes NaN.

Perhaps something with keras/tf has been updated since this example was written, but as I am new to keras I am wondering what might be the issue here.

Loss is NAN using Keras on the MNIST digit set

Answers (1)

Related Questions