CNN train accuracy gets better during training, but test accuracy stays around 40%

Question

So in the past few months I've been learning a lot about neural networks with Tensorflow and Keras, so I wanted to try to make a model for the CIFAR10 dataset (code below).

However, during the training process, the accuracy gets better (from about 35% after 1 epoch to about 60-65% after 5 epochs), but the val_acc stays the same or increases only a little. Here are the printed results:

Epoch 1/5
50000/50000 [==============================] - 454s 9ms/step - loss: 1.7761 - acc: 0.3584 - val_loss: 8.6776 - val_acc: 0.4489
Epoch 2/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.3670 - acc: 0.5131 - val_loss: 8.9749 - val_acc: 0.4365
Epoch 3/5
50000/50000 [==============================] - 451s 9ms/step - loss: 1.2089 - acc: 0.5721 - val_loss: 7.7254 - val_acc: 0.5118
Epoch 4/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.1140 - acc: 0.6080 - val_loss: 7.9587 - val_acc: 0.4997
Epoch 5/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.0306 - acc: 0.6385 - val_loss: 7.4351 - val_acc: 0.5321
10000/10000 [==============================] - 27s 3ms/step
loss:  7.435152648162842 
accuracy:  0.5321

I've looked around on the internet and my best guess is that my model is overfitted, so I've tried removing some layers, adding more dropout layers and reducing the amount of filters, but none showed any enhancement.

The weirdest thing is that a while ago I made a very similar model, based on some tutorials, which had a final accuracy of 80% after 8 epochs. (I lost that file though)

Here is the code of my model:

model = Sequential()
model.add(Conv2D(filters=256,
                 kernel_size=(3, 3),
                 activation='relu',
                 data_format='channels_last',
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=128,
                 kernel_size=(2, 2),
                 activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))


model.compile(optimizer=adam(),
              loss=categorical_crossentropy,
              metrics=['accuracy'])

model.fit(train_images, train_labels,
          batch_size=1000,
          epochs=5,
          verbose=1,
          validation_data=(test_images, test_labels))

loss, accuracy = model.evaluate(test_images, test_labels)
print('loss: ', loss, '
accuracy: ', accuracy)

train_images and test_images are numpy arrays of size (50000,32,32,3) and (10000,32,32,3) and train_labels and test_labels are numpy arrays of size (50000,10) and (10000,10).

My question: what causes this and what can I do about it?

Edit after Maxim's answer:

I changed the model to this:

model = Sequential()
model.add(Conv2D(filters=64,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal',    # better for relu based networks
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))

and the output is now this:

Epoch 1/10
50000/50000 [==============================] - 326s 7ms/step - loss: 1.4916 - acc: 0.4809 - val_loss: 7.7175 - val_acc: 0.5134
Epoch 2/10
50000/50000 [==============================] - 338s 7ms/step - loss: 1.0622 - acc: 0.6265 - val_loss: 6.9945 - val_acc: 0.5588
Epoch 3/10
50000/50000 [==============================] - 326s 7ms/step - loss: 0.8957 - acc: 0.6892 - val_loss: 6.6270 - val_acc: 0.5833
Epoch 4/10
50000/50000 [==============================] - 324s 6ms/step - loss: 0.7813 - acc: 0.7271 - val_loss: 5.5790 - val_acc: 0.6474
Epoch 5/10
50000/50000 [==============================] - 327s 7ms/step - loss: 0.6690 - acc: 0.7668 - val_loss: 5.7479 - val_acc: 0.6358
Epoch 6/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.5671 - acc: 0.8031 - val_loss: 5.8720 - val_acc: 0.6302
Epoch 7/10
50000/50000 [==============================] - 328s 7ms/step - loss: 0.4865 - acc: 0.8319 - val_loss: 5.6320 - val_acc: 0.6451
Epoch 8/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3995 - acc: 0.8611 - val_loss: 5.3879 - val_acc: 0.6615
Epoch 9/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3337 - acc: 0.8837 - val_loss: 5.6874 - val_acc: 0.6432
Epoch 10/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.2806 - acc: 0.9033 - val_loss: 5.7424 - val_acc: 0.6399
10000/10000 [==============================] - 19s 2ms/step
loss:  5.74234927444458 
accuracy:  0.6399

It seems that I'm overfitting again, even though I changed the model with the help I've gotten so far... Any explanations or tips?

The input images are (32,32,3) numpy arrays normalized to (0,1)

Maxim · Accepted Answer

You haven't included how you prepare the data, here's one addition that made this network learn much better:

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

If you do data normalization like that, then your network is fine: it hits ~65-70% test accuracy after 5 epochs, which is a good result. Note that 5 epochs is just a start, it would need around 30-50 epochs to really learn the data well and show a result close to state of the art.

Below are some minor improvements that I noticed and can get you extra performance points:

Since you're using ReLu based network, he_normal initializer is better than glorot_uniform (which is a default in Conv2D).
It is strange to decrease the number of filters as you go deeper in the network. You should do right the opposite. I changed 256 -> 64 and 128 -> 256 and the accuracy improved.
I decreased the dropout slightly 0.5 -> 0.4.
Kernel size 3x3 is more common than 2x2. I think you should try it for the second conv layer as well. In fact, you can play with all hyper-parameters to find the best combination.

Here's the final code:

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

model = Sequential()
model.add(Conv2D(filters=64,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal',
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
                 kernel_size=(2, 2),
                 kernel_initializer='he_normal',
                 activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(10, activation='softmax'))

model.compile(optimizer=adam(),
              loss=categorical_crossentropy,
              metrics=['accuracy'])

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

model.fit(x_train, y_train,
          batch_size=500,
          epochs=5,
          verbose=1,
          validation_data=(x_test, y_test))

loss, accuracy = model.evaluate(x_test, y_test)
print('loss: ', loss, '
accuracy: ', accuracy)

The result after 5 epochs:

loss:  0.822134458447 
accuracy:  0.7126

By the way, you might be interested to compare your approach with keras example CIFAR-10 conv net.

CNN train accuracy gets better during training, but test accuracy stays around 40%

Edit after Maxim's answer:

Answers (1)

Related Questions