Reputation: 51

Why doesn't the accuracy when training VGG-16 change much?

Currently, I'm trying to train a dataset upon a VGG-16 model. The issue is that the accuracy doesn't change much, but it isn't stuck to a fixed accuracy. The figure of plot can be seen below. Any suggestions why this happens?

I've followed several guides to fix this issue that is about stuck accuracy, but they don't work.

Figure of accuracy plot

EDIT:

200 Epochs

200 Epoch Plot

50 Epochs with Imagenet Weights

Code

The input size for the model is 600 images of 224x224x3. Moreover, two labels dog and cat (0,1).

Properties

imageSize = (224,224,3)
epochs = 25
batch_size = 32

Model

from keras.applications.vgg16 import VGG16
vgg = VGG16(input_shape=imageSize,weights=None,include_top=False)

x = Flatten()(vgg.output)
prediction = Dense(1,activation='sigmoid')(x)

model = Model(inputs=vgg.input,outputs=prediction)
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])

Image Generator

from keras.applications.vgg16 import preprocess_input
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator

imgGen = ImageDataGenerator(rotation_range=20,
                            width_shift_range=0.1,
                            height_shift_range=0.1,
                            shear_range=0.1,
                            zoom_range=0.2,
                            horizontal_flip=True,
                            vertical_flip=True,
                            preprocessing_function = preprocess_input)

Fit Model

r = model.fit_generator(imgGen.flow(trainX, trainY, batch_size=batch_size),
                        validation_data = imgGen.flow(testX, testY, batch_size=batch_size),
                        epochs=epochs,
                        steps_per_epoch=len(trainX)//batch_size,
                        validation_steps=len(testX)//batch_size,
                        verbose = 1,
                       )

Upvotes: 5

Answers (7)

mravciak

Reputation: 35

if you are training your model from scratch, do not forget about the weight initialization - some example is here

Upvotes: 0

Ray Walker

Reputation: 293

The Reason

The absence of fully connected layers before the prediction layer.

Background

You can view the VGG model (and actually most of the other models too) as consisting of

a feature extractor (the convolutional and pooling layers) and
a fully connected network, that uses the extracted features to learn the desired classification.

By using include_top=False, you remove the fully connected layers in the VGG16 model. Therefore, you end up with the feature extractor only. There is no network that uses the features for classification.

Solution

Add two fully connected hidden layers between the feature extractor (vgg) and your prediction layer. It is also recommended to use Transfer Learning as your data set is too small for the huge vgg net with millions of parameters. The code should look about like this (have not tested it):

# load the model (only the feature extractor) with the imagenet weights 
vgg = VGG16(input_shape=imageSize, weights='imagenet', include_top=False, pooling='avg')
# freeze the feature extractor values, as they're already pretrained on imagenet
vgg.trainable = False
# build the classificator model
model = Sequential()
# use vgg as feature extractor
model.add(vgg)
# add two hidden layers for classification
model.add(Dense(512, activation=('relu'))
model.add(Dense(256, activation=('relu'))
# add the prediction layer
model.add(Dense(1, activation=('sigmoid'))

Best wishes and luck to everyone!

Upvotes: 1

Khalid Saifullah

Reputation: 795

You were looking for the reason about why it's happening I presume and seems like you didn't get the answer, so here it is...

The reason is in VGGNet, AlexNet the parameter space is huge, to deal with this issue, it doesn't have any sophisticated techniques like BatchNorm used in ResNet and latter models. So in VGGNet to make the model converge you gotta do it yourself, play with the hyperparameters especially the learning rate, the empirical result shows starting with as low as 1e-6 even helps to converge. Also if you could use some different weight initialization for the weights, that would show tremendous result in terms of convergence, cause the default weight initialization doesn't work well in this case. Lastly, let the model train for longer epochs (like 100) as the space (of parameters) is quite bumpy, you'll see it oscillating a bit but with proper lr it'll converge but take some time.

Hope It gives you the intuition a bit...

Upvotes: 2

traivsh

Reputation: 115

For people who may have similar problems, you can try the followings:

load pre-trained VGG-16 weights
only make last few convolutional layers trainable
use SGD optimiser and set learning rate low
set correct activation function at output layer
increase epochs

Regards to using pre-trained weights, the benefit of using pre-trained weights is that you can overcome a limitation of small dataset such as OP's situation with 600 images. But you gotta make sure only last few layers are made trainable and rest are made untrainable.

Upvotes: 3

piupiu_island

Reputation: 1

25 epochs is not enough, try 100 epochs or 200 epochs

def model(self):
    inputs = keras.layers.Input(shape=self.input_Shape)
    x = keras.layers.Conv2D(16, (3,3), activation='relu')(inputs)
    x = keras.layers.MaxPooling2D(2,2)(x)
    x = keras.layers.Conv2D(32,(3,3),activation='relu')(x)
    x = keras.layers.MaxPooling2D(2,2)(x)
    x = keras.layers.Conv2D(64,(3,3),activation='relu')(x)
    x = keras.layers.MaxPooling2D(2,2)(x)
    x = keras.layers.Flatten()(x)
    x = keras.layers.Dense(512,activation='relu')(x)
    outputs = keras.layers.Dense(1,activation='sigmoid')(x)

    model = keras.models.Model(inputs, outputs)
    model.summary()
    model.compile(optimizer=RMSprop(lr=0.001),
                  loss='binary_crossentropy',
                  metrics = ['acc'])

    return model

Upvotes: 0

Eric Yu

Reputation: 451

I'd suggest you fine-tuning the pre-trained model, and freeze the weights of the first few layers. like:

vgg = VGG16(input_shape=imageSize,weights='imagenet',include_top=False)
for layer in vgg.layers[0:-10]:
    layer.trainable = false

Upvotes: 3

Dr. Snoopy

Reputation: 56407

Do not use the adam optimizer to train VGG, it is well known that it fails due to the large number of parameters in the VGG network. Just use sgd and tune the learning rate, say starting from 0.01, increasing 10x or 0.1x until the training loss decreases nicely.

Upvotes: 17

Why doesn&#39;t the accuracy when training VGG-16 change much?