Keras CNN - Model either underfitting or overfitting

Question

I have been trying to build a CNN model using 2 different models for my image dataset using Image generator pre-processing. This is my code for the first model:

height=150
width=150
channels=3
batch_size=32
seed=1337

# Training generator
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_dir, 
                                                    target_size=(height,width),
                                                    batch_size=batch_size,
                                                    seed=seed,
                                                    class_mode='categorical')

# Test generator
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(test_dir, 
                                                  target_size=(height,width), 
                                                  batch_size=batch_size,
                                                  seed=seed,
                                                  class_mode='categorical')

which gives an output:

Found 3004 images belonging to 7 classes.

Found 794 images belonging to 7 classes.

This is my model architecture:

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(150, 150, 3)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

model.add(Conv2D(32, (3, 3)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

model.add(Conv2D(64, (3, 3)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# the model so far outputs 3D feature maps (height, width, features)
model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(64))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(7))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
history = model.fit_generator(
        train_generator,
        steps_per_epoch=3004 // batch_size,
        epochs=50,
        validation_data=test_generator,
        validation_steps=794 // batch_size)

And after 30 epochs this is my status:

Epoch 30/50
94/93 [==============================] - 295s 3s/step - loss: 0.1396 - acc: 0.9433 - val_loss: 2.3553 - val_acc: 0.4534

which shows its completely overfitting.

Now this is the second model that I tried:

Image generator:

train_datagen = ImageDataGenerator(
    rotation_range = 40,                  
    width_shift_range = 0.2,                  
    height_shift_range = 0.2,                  
    rescale = 1./255,                  
    shear_range = 0.2,                  
    zoom_range = 0.2,                     
    horizontal_flip = True)
validation_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(150,150),
    class_mode='categorical',
    batch_size = 32)
validation_generator = validation_datagen.flow_from_directory(
    test_dir,
    target_size=(150,150),
    class_mode='categorical',
    batch_size = 32)

And my model architecture:

cnn = Sequential()
cnn.add(Conv2D(filters=32, 
               kernel_size=(2,2), 
               strides=(1,1),
               padding='same',
               input_shape=(150,150,3),
               data_format='channels_last'))
cnn.add(Activation('relu'))
cnn.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))
cnn.add(Conv2D(filters=64,
               kernel_size=(2,2),
               strides=(1,1),
               padding='valid'))
cnn.add(Activation('relu'))
cnn.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))
cnn.add(Flatten())        
cnn.add(Dense(64))
cnn.add(Activation('relu'))
cnn.add(Dropout(0.25))
cnn.add(Dense(7))
cnn.add(Activation('softmax'))
cnn.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

And this is what I get after 30 epochs:

Epoch 30/50
    94/93 [==============================] - 295s 3s/step - loss: 1.1396 - acc: 0.5633 - val_loss: 1.3553 - val_acc: 0.5534

Which shows that model is not overfitting but it is certainly not able to predict well.

Based on the above two models, is it a problem with both the models, or the images that I have? What can be the best solution to this problem and how can I try different models, to check which one works fine?

Also I want to know that, if I'm successfully able to build the model, how do I predict on new images, because model.predict doesn't seem to work, and if I keep my images inside a folder and use model.predict_generator it show:

Found 0 images belonging to 0 class

But my first priority is how do I build a model, its either overfitting or underfitting and I'm not quite able to figure out the problem.

Justin Wilson · Accepted Answer

Second model with similar training and validation accuracy looks better, perhaps because model is simpler so prevents overfitting. I'd say that you probably need more data. Only 3,000 images for a CNN model seems low. ImageNet uses millions of images. You may want to increase number of images using data augmentation techniques such as image transformations (scaling, rotation, translation), adding Gaussian noise to the image, etc.

Keras CNN - Model either underfitting or overfitting

Answers (1)

Related Questions