Reputation: 775
So in the past few months I've been learning a lot about neural networks with Tensorflow and Keras, so I wanted to try to make a model for the CIFAR10 dataset (code below).
However, during the training process, the accuracy gets better (from about 35% after 1 epoch to about 60-65% after 5 epochs), but the val_acc stays the same or increases only a little. Here are the printed results:
Epoch 1/5
50000/50000 [==============================] - 454s 9ms/step - loss: 1.7761 - acc: 0.3584 - val_loss: 8.6776 - val_acc: 0.4489
Epoch 2/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.3670 - acc: 0.5131 - val_loss: 8.9749 - val_acc: 0.4365
Epoch 3/5
50000/50000 [==============================] - 451s 9ms/step - loss: 1.2089 - acc: 0.5721 - val_loss: 7.7254 - val_acc: 0.5118
Epoch 4/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.1140 - acc: 0.6080 - val_loss: 7.9587 - val_acc: 0.4997
Epoch 5/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.0306 - acc: 0.6385 - val_loss: 7.4351 - val_acc: 0.5321
10000/10000 [==============================] - 27s 3ms/step
loss: 7.435152648162842
accuracy: 0.5321
I've looked around on the internet and my best guess is that my model is overfitted, so I've tried removing some layers, adding more dropout layers and reducing the amount of filters, but none showed any enhancement.
The weirdest thing is that a while ago I made a very similar model, based on some tutorials, which had a final accuracy of 80% after 8 epochs. (I lost that file though)
Here is the code of my model:
model = Sequential()
model.add(Conv2D(filters=256,
kernel_size=(3, 3),
activation='relu',
data_format='channels_last',
input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=128,
kernel_size=(2, 2),
activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=adam(),
loss=categorical_crossentropy,
metrics=['accuracy'])
model.fit(train_images, train_labels,
batch_size=1000,
epochs=5,
verbose=1,
validation_data=(test_images, test_labels))
loss, accuracy = model.evaluate(test_images, test_labels)
print('loss: ', loss, '\naccuracy: ', accuracy)
train_images
and test_images
are numpy arrays
of size (50000,32,32,3)
and (10000,32,32,3)
and train_labels
and test_labels
are numpy arrays
of size (50000,10)
and (10000,10)
.
My question: what causes this and what can I do about it?
I changed the model to this:
model = Sequential()
model.add(Conv2D(filters=64,
kernel_size=(3, 3),
activation='relu',
kernel_initializer='he_normal', # better for relu based networks
input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
kernel_size=(3, 3),
activation='relu',
kernel_initializer='he_normal'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))
and the output is now this:
Epoch 1/10
50000/50000 [==============================] - 326s 7ms/step - loss: 1.4916 - acc: 0.4809 - val_loss: 7.7175 - val_acc: 0.5134
Epoch 2/10
50000/50000 [==============================] - 338s 7ms/step - loss: 1.0622 - acc: 0.6265 - val_loss: 6.9945 - val_acc: 0.5588
Epoch 3/10
50000/50000 [==============================] - 326s 7ms/step - loss: 0.8957 - acc: 0.6892 - val_loss: 6.6270 - val_acc: 0.5833
Epoch 4/10
50000/50000 [==============================] - 324s 6ms/step - loss: 0.7813 - acc: 0.7271 - val_loss: 5.5790 - val_acc: 0.6474
Epoch 5/10
50000/50000 [==============================] - 327s 7ms/step - loss: 0.6690 - acc: 0.7668 - val_loss: 5.7479 - val_acc: 0.6358
Epoch 6/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.5671 - acc: 0.8031 - val_loss: 5.8720 - val_acc: 0.6302
Epoch 7/10
50000/50000 [==============================] - 328s 7ms/step - loss: 0.4865 - acc: 0.8319 - val_loss: 5.6320 - val_acc: 0.6451
Epoch 8/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3995 - acc: 0.8611 - val_loss: 5.3879 - val_acc: 0.6615
Epoch 9/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3337 - acc: 0.8837 - val_loss: 5.6874 - val_acc: 0.6432
Epoch 10/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.2806 - acc: 0.9033 - val_loss: 5.7424 - val_acc: 0.6399
10000/10000 [==============================] - 19s 2ms/step
loss: 5.74234927444458
accuracy: 0.6399
It seems that I'm overfitting again, even though I changed the model with the help I've gotten so far... Any explanations or tips?
The input images are (32,32,3)
numpy arrays normalized to (0,1)
Upvotes: 4
Views: 5193
Reputation: 53768
You haven't included how you prepare the data, here's one addition that made this network learn much better:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
If you do data normalization like that, then your network is fine: it hits ~65-70% test accuracy after 5 epochs, which is a good result. Note that 5 epochs is just a start, it would need around 30-50 epochs to really learn the data well and show a result close to state of the art.
Below are some minor improvements that I noticed and can get you extra performance points:
he_normal
initializer is better than glorot_uniform
(which is a default in Conv2D).256 -> 64
and 128 -> 256
and the accuracy improved.0.5 -> 0.4
.3x3
is more common than 2x2
. I think you should try it for the second conv layer as well. In fact, you can play with all hyper-parameters to find the best combination.Here's the final code:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
model = Sequential()
model.add(Conv2D(filters=64,
kernel_size=(3, 3),
activation='relu',
kernel_initializer='he_normal',
input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
kernel_size=(2, 2),
kernel_initializer='he_normal',
activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=adam(),
loss=categorical_crossentropy,
metrics=['accuracy'])
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
model.fit(x_train, y_train,
batch_size=500,
epochs=5,
verbose=1,
validation_data=(x_test, y_test))
loss, accuracy = model.evaluate(x_test, y_test)
print('loss: ', loss, '\naccuracy: ', accuracy)
The result after 5 epochs:
loss: 0.822134458447
accuracy: 0.7126
By the way, you might be interested to compare your approach with keras example CIFAR-10 conv net.
Upvotes: 6