Train accuracy decreases with train loss

Question

I wrote this very simple code

model = keras.models.Sequential()
model.add(layers.Dense(13000, input_dim=X_train.shape[1], activation='relu', trainable=False))
model.add(layers.Dense(1, input_dim=13000, activation='linear'))
model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])

model.fit(X_train, y_train, batch_size=X_train.shape[0], epochs=1000000, verbose=1)

The data is MNIST but only for digits '0' and '1'. I have a very strange issue, where the loss is monotonically decreasing to zero, as expected, yet the accuracy instead of increasing, is also decreasing.

Here is a sample output

12665/12665 [==============================] - 0s 11us/step - loss: 0.0107 - accuracy: 0.2355
Epoch 181/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0114 - accuracy: 0.2568
Epoch 182/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0128 - accuracy: 0.2726
Epoch 183/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0133 - accuracy: 0.2839
Epoch 184/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0134 - accuracy: 0.2887
Epoch 185/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0110 - accuracy: 0.2842
Epoch 186/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0101 - accuracy: 0.2722
Epoch 187/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0094 - accuracy: 0.2583

Since we only have two classes, the benchmark for lowest possible accuracy should be 0.5, and furthermore we are monitoring accuracy on the training set, so it should very going up to 100%, I expect overfitting and I am overfitting according to the loss function.

At the final epoch, this is the situation

12665/12665 [==============================] - 0s 11us/step - loss: 9.9710e-06 - accuracy: 0.0758

a 7% accuracy when the worst theoretical possibility if you guess randomly is 50%. This is no accident. Something is going on here.

Can anyone see the problem?

Entire code

from tensorflow import keras
import numpy as np
from matplotlib import pyplot as plt
import keras
from keras.callbacks import Callback
from keras import layers
import warnings

class EarlyStoppingByLossVal(Callback):
    def __init__(self, monitor='val_loss', value=0.00001, verbose=0):
        super(Callback, self).__init__()
        self.monitor = monitor
        self.value = value
        self.verbose = verbose

    def on_epoch_end(self, epoch, logs={}):
        current = logs.get(self.monitor)
        if current is None:
            warnings.warn("Early stopping requires %s available!" % self.monitor, RuntimeWarning)

        if current < self.value:
            if self.verbose > 0:
                print("Epoch %05d: early stopping THR" % epoch)
            self.model.stop_training = True

def load_mnist():

    mnist = keras.datasets.mnist
    (train_images, train_labels), (test_images, test_labels) = mnist.load_data()


    train_images = np.reshape(train_images, (train_images.shape[0], train_images.shape[1] * train_images.shape[2]))
    test_images = np.reshape(test_images, (test_images.shape[0], test_images.shape[1] * test_images.shape[2]))
    train_labels = np.reshape(train_labels, (train_labels.shape[0],))
    test_labels = np.reshape(test_labels, (test_labels.shape[0],))

    train_images = train_images[(train_labels == 0) | (train_labels == 1)]
    test_images = test_images[(test_labels == 0) | (test_labels == 1)]

    train_labels = train_labels[(train_labels == 0) | (train_labels == 1)]
    test_labels = test_labels[(test_labels == 0) | (test_labels == 1)]
    train_images, test_images = train_images / 255, test_images / 255

    return train_images, train_labels, test_images, test_labels



X_train, y_train, X_test, y_test = load_mnist()
train_acc = []
train_errors = []
test_acc = []
test_errors = []

width_list = [13000]
for width in width_list:
    print(width)

    model = keras.models.Sequential()
    model.add(layers.Dense(width, input_dim=X_train.shape[1], activation='relu', trainable=False))
    model.add(layers.Dense(1, input_dim=width, activation='linear'))
    model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])

    callbacks = [EarlyStoppingByLossVal(monitor='loss', value=0.00001, verbose=1)]
    model.fit(X_train, y_train, batch_size=X_train.shape[0], epochs=1000000, verbose=1, callbacks=callbacks)


    train_errors.append(model.evaluate(X_train, y_train)[0])
    test_errors.append(model.evaluate(X_test, y_test)[0])
    train_acc.append(model.evaluate(X_train, y_train)[1])
    test_acc.append(model.evaluate(X_test, y_test)[1])


plt.plot(width_list, train_errors, marker='D')
plt.xlabel("width")
plt.ylabel("train loss")
plt.show()
plt.plot(width_list, test_errors, marker='D')
plt.xlabel("width")
plt.ylabel("test loss")
plt.show()
plt.plot(width_list, train_acc, marker='D')
plt.xlabel("width")
plt.ylabel("train acc")
plt.show()
plt.plot(width_list, test_acc, marker='D')
plt.xlabel("width")
plt.ylabel("test acc")
plt.show()

desertnaut · Accepted Answer

A linear activation in the last layer for a (binary) classification problem is meaningless; change your last layer to:

model.add(layers.Dense(1, input_dim=width, activation='sigmoid'))

Linear activations for the last layer are used for regression problems and not for classification ones.

Train accuracy decreases with train loss

Answers (1)

Related Questions