Reputation: 1158
I'm trying to finetune the two last layers of a VGG model with LFW dataset , I've changed the softmax layer dimensions by removing the original one and adding my softmax layer with 19 outputs in my case since there are 19 classes that I'm trying to train. I also want to finetune the last fully connected layer in order to make a "custom feature extractor"
I'm setting layers that I want to be non-trainable like this:
for layer in model.layers:
layer.trainable = False
Using a gpu it takes me like 1 hour per epoch to train with 19 classes and a minimum of 40 images per each class.
Since I don't have a lot of samples, it's kind of strange this training performance.
Anyone knows why is this happening?
Here the log:
Image shape: (224, 224, 3)
Number of classes: 19
K.image_dim_ordering: th
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_1 (InputLayer) (None, 3, 224, 224) 0
____________________________________________________________________________________________________
conv1_1 (Convolution2D) (None, 64, 224, 224) 1792 input_1[0][0]
____________________________________________________________________________________________________
conv1_2 (Convolution2D) (None, 64, 224, 224) 36928 conv1_1[0][0]
____________________________________________________________________________________________________
pool1 (MaxPooling2D) (None, 64, 112, 112) 0 conv1_2[0][0]
____________________________________________________________________________________________________
conv2_1 (Convolution2D) (None, 128, 112, 112) 73856 pool1[0][0]
____________________________________________________________________________________________________
conv2_2 (Convolution2D) (None, 128, 112, 112) 147584 conv2_1[0][0]
____________________________________________________________________________________________________
pool2 (MaxPooling2D) (None, 128, 56, 56) 0 conv2_2[0][0]
____________________________________________________________________________________________________
conv3_1 (Convolution2D) (None, 256, 56, 56) 295168 pool2[0][0]
____________________________________________________________________________________________________
conv3_2 (Convolution2D) (None, 256, 56, 56) 590080 conv3_1[0][0]
____________________________________________________________________________________________________
conv3_3 (Convolution2D) (None, 256, 56, 56) 590080 conv3_2[0][0]
____________________________________________________________________________________________________
pool3 (MaxPooling2D) (None, 256, 28, 28) 0 conv3_3[0][0]
____________________________________________________________________________________________________
conv4_1 (Convolution2D) (None, 512, 28, 28) 1180160 pool3[0][0]
____________________________________________________________________________________________________
conv4_2 (Convolution2D) (None, 512, 28, 28) 2359808 conv4_1[0][0]
____________________________________________________________________________________________________
conv4_3 (Convolution2D) (None, 512, 28, 28) 2359808 conv4_2[0][0]
____________________________________________________________________________________________________
pool4 (MaxPooling2D) (None, 512, 14, 14) 0 conv4_3[0][0]
____________________________________________________________________________________________________
conv5_1 (Convolution2D) (None, 512, 14, 14) 2359808 pool4[0][0]
____________________________________________________________________________________________________
conv5_2 (Convolution2D) (None, 512, 14, 14) 2359808 conv5_1[0][0]
____________________________________________________________________________________________________
conv5_3 (Convolution2D) (None, 512, 14, 14) 2359808 conv5_2[0][0]
____________________________________________________________________________________________________
pool5 (MaxPooling2D) (None, 512, 7, 7) 0 conv5_3[0][0]
____________________________________________________________________________________________________
flatten (Flatten) (None, 25088) 0 pool5[0][0]
____________________________________________________________________________________________________
fc6 (Dense) (None, 4096) 102764544 flatten[0][0]
____________________________________________________________________________________________________
fc7 (Dense) (None, 4096) 16781312 fc6[0][0]
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 4096) 16384 fc7[0][0]
____________________________________________________________________________________________________
fc8 (Dense) (None, 19) 77843 batchnormalization_1[0][0]
====================================================================================================
Total params: 134,354,771
Trainable params: 16,867,347
Non-trainable params: 117,487,424
____________________________________________________________________________________________________
None
Train on 1120 samples, validate on 747 samples
Epoch 1/20
1120/1120 [==============================] - 7354s - loss: 2.9517 - acc: 0.0714 - val_loss: 2.9323 - val_acc: 0.2316
Epoch 2/20
1120/1120 [==============================] - 7356s - loss: 2.8053 - acc: 0.1732 - val_loss: 2.9187 - val_acc: 0.3614
Epoch 3/20
1120/1120 [==============================] - 7358s - loss: 2.6727 - acc: 0.2643 - val_loss: 2.9034 - val_acc: 0.3882
Epoch 4/20
1120/1120 [==============================] - 7361s - loss: 2.5565 - acc: 0.3071 - val_loss: 2.8861 - val_acc: 0.4016
Epoch 5/20
1120/1120 [==============================] - 7360s - loss: 2.4597 - acc: 0.3518 - val_loss: 2.8667 - val_acc: 0.4043
Epoch 6/20
1120/1120 [==============================] - 7363s - loss: 2.3827 - acc: 0.3714 - val_loss: 2.8448 - val_acc: 0.4163
Epoch 7/20
1120/1120 [==============================] - 7364s - loss: 2.3108 - acc: 0.4045 - val_loss: 2.8196 - val_acc: 0.4244
Epoch 8/20
1120/1120 [==============================] - 7377s - loss: 2.2463 - acc: 0.4268 - val_loss: 2.7905 - val_acc: 0.4324
Epoch 9/20
1120/1120 [==============================] - 7373s - loss: 2.1824 - acc: 0.4563 - val_loss: 2.7572 - val_acc: 0.4404
Epoch 10/20
1120/1120 [==============================] - 7373s - loss: 2.1313 - acc: 0.4732 - val_loss: 2.7190 - val_acc: 0.4471
Epoch 11/20
1120/1120 [==============================] - 7440s - loss: 2.0766 - acc: 0.5036 - val_loss: 2.6754 - val_acc: 0.4565
Epoch 12/20
1120/1120 [==============================] - 7414s - loss: 2.0323 - acc: 0.5170 - val_loss: 2.6263 - val_acc: 0.4565
Epoch 13/20
1120/1120 [==============================] - 7413s - loss: 1.9840 - acc: 0.5420 - val_loss: 2.5719 - val_acc: 0.4592
Epoch 14/20
1120/1120 [==============================] - 7414s - loss: 1.9467 - acc: 0.5464 - val_loss: 2.5130 - val_acc: 0.4592
Epoch 15/20
1120/1120 [==============================] - 7412s - loss: 1.9039 - acc: 0.5652 - val_loss: 2.4513 - val_acc: 0.4592
Epoch 16/20
1120/1120 [==============================] - 7413s - loss: 1.8716 - acc: 0.5723 - val_loss: 2.3906 - val_acc: 0.4578
Epoch 17/20
1120/1120 [==============================] - 7415s - loss: 1.8214 - acc: 0.5866 - val_loss: 2.3319 - val_acc: 0.4538
Epoch 18/20
1120/1120 [==============================] - 7416s - loss: 1.7860 - acc: 0.5982 - val_loss: 2.2789 - val_acc: 0.4538
Epoch 19/20
1120/1120 [==============================] - 7430s - loss: 1.7623 - acc: 0.5973 - val_loss: 2.2322 - val_acc: 0.4538
Epoch 20/20
1120/1120 [==============================] - 7856s - loss: 1.7222 - acc: 0.6170 - val_loss: 2.1913 - val_acc: 0.4538
Accuracy: 45.38%
The results are not good because I can't train it for more data because it takes too long. Any idea?
Upvotes: 2
Views: 2262
Reputation: 40516
Please notice that you want to feed ~ 19 * 40 < 800
example in order to train 16,867,347
parameters. So this is basically 2e6
paramters per example. This simply cannot work properly. Try to delete all FCN
layers (Dense
layers at the top) and put smaller Dense
with e.g. ~ 50 neurons each. In my opinion this should help you in improving accuracy and speeding up training.
Upvotes: 2