Reputation: 51
Currently, I'm trying to train a dataset upon a VGG-16 model. The issue is that the accuracy doesn't change much, but it isn't stuck to a fixed accuracy. The figure of plot can be seen below. Any suggestions why this happens?
I've followed several guides to fix this issue that is about stuck accuracy, but they don't work.
EDIT:
The input size for the model is 600 images of 224x224x3. Moreover, two labels dog and cat (0,1).
Properties
imageSize = (224,224,3)
epochs = 25
batch_size = 32
Model
from keras.applications.vgg16 import VGG16
vgg = VGG16(input_shape=imageSize,weights=None,include_top=False)
x = Flatten()(vgg.output)
prediction = Dense(1,activation='sigmoid')(x)
model = Model(inputs=vgg.input,outputs=prediction)
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
Image Generator
from keras.applications.vgg16 import preprocess_input
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
imgGen = ImageDataGenerator(rotation_range=20,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip=True,
preprocessing_function = preprocess_input)
Fit Model
r = model.fit_generator(imgGen.flow(trainX, trainY, batch_size=batch_size),
validation_data = imgGen.flow(testX, testY, batch_size=batch_size),
epochs=epochs,
steps_per_epoch=len(trainX)//batch_size,
validation_steps=len(testX)//batch_size,
verbose = 1,
)
Upvotes: 5
Views: 10389
Reputation: 35
if you are training your model from scratch, do not forget about the weight initialization - some example is here
Upvotes: 0
Reputation: 293
The absence of fully connected layers before the prediction layer.
You can view the VGG model (and actually most of the other models too) as consisting of
By using include_top=False
, you remove the fully connected layers in the VGG16 model. Therefore, you end up with the feature extractor only. There is no network that uses the features for classification.
Add two fully connected hidden layers between the feature extractor (vgg) and your prediction layer. It is also recommended to use Transfer Learning as your data set is too small for the huge vgg net with millions of parameters. The code should look about like this (have not tested it):
# load the model (only the feature extractor) with the imagenet weights
vgg = VGG16(input_shape=imageSize, weights='imagenet', include_top=False, pooling='avg')
# freeze the feature extractor values, as they're already pretrained on imagenet
vgg.trainable = False
# build the classificator model
model = Sequential()
# use vgg as feature extractor
model.add(vgg)
# add two hidden layers for classification
model.add(Dense(512, activation=('relu'))
model.add(Dense(256, activation=('relu'))
# add the prediction layer
model.add(Dense(1, activation=('sigmoid'))
Best wishes and luck to everyone!
Upvotes: 1
Reputation: 785
You were looking for the reason about why it's happening I presume and seems like you didn't get the answer, so here it is...
The reason is in VGGNet, AlexNet the parameter space is huge, to deal with this issue, it doesn't have any sophisticated techniques like BatchNorm used in ResNet and latter models. So in VGGNet to make the model converge you gotta do it yourself, play with the hyperparameters especially the learning rate, the empirical result shows starting with as low as 1e-6 even helps to converge. Also if you could use some different weight initialization for the weights, that would show tremendous result in terms of convergence, cause the default weight initialization doesn't work well in this case. Lastly, let the model train for longer epochs (like 100) as the space (of parameters) is quite bumpy, you'll see it oscillating a bit but with proper lr it'll converge but take some time.
Hope It gives you the intuition a bit...
Upvotes: 2
Reputation: 115
For people who may have similar problems, you can try the followings:
Regards to using pre-trained weights, the benefit of using pre-trained weights is that you can overcome a limitation of small dataset such as OP's situation with 600 images. But you gotta make sure only last few layers are made trainable and rest are made untrainable.
Upvotes: 3
Reputation: 1
25 epochs is not enough, try 100 epochs or 200 epochs
def model(self):
inputs = keras.layers.Input(shape=self.input_Shape)
x = keras.layers.Conv2D(16, (3,3), activation='relu')(inputs)
x = keras.layers.MaxPooling2D(2,2)(x)
x = keras.layers.Conv2D(32,(3,3),activation='relu')(x)
x = keras.layers.MaxPooling2D(2,2)(x)
x = keras.layers.Conv2D(64,(3,3),activation='relu')(x)
x = keras.layers.MaxPooling2D(2,2)(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(512,activation='relu')(x)
outputs = keras.layers.Dense(1,activation='sigmoid')(x)
model = keras.models.Model(inputs, outputs)
model.summary()
model.compile(optimizer=RMSprop(lr=0.001),
loss='binary_crossentropy',
metrics = ['acc'])
return model
Upvotes: 0
Reputation: 451
I'd suggest you fine-tuning the pre-trained model, and freeze the weights of the first few layers. like:
vgg = VGG16(input_shape=imageSize,weights='imagenet',include_top=False)
for layer in vgg.layers[0:-10]:
layer.trainable = false
Upvotes: 3
Reputation: 56357
Do not use the adam
optimizer to train VGG, it is well known that it fails due to the large number of parameters in the VGG network. Just use sgd
and tune the learning rate, say starting from 0.01, increasing 10x or 0.1x until the training loss decreases nicely.
Upvotes: 16