CNN architecture: classifying "good" and "bad" images

Question

I'm researching the possibility of implementing a CNN in order to classify images as "good" or "bad" but am having no luck with my current architecture.

Characteristics that denote a "bad" image:

Overexposure
Oversaturation
Incorrect white balance
Blurriness

Would it be feasible to implement a neural network to classify images based on these characteristics or is it best left to a traditional algorithm that simply looks at the variance in brightness/contrast throughout an image and classifies it that way?

I have attempted training a CNN using the VGGNet architecture but I always seem to get a biased and unreliable model, regardless of the number of epochs or number of steps.

Examples:

My current model's architecture is very simple (as I am new to the whole machine learning world) but seemed to work fine with other classification problems, and I have modified it slightly to work better with this binary classification problem:

    # CONV => RELU => POOL layer set
    # define convolutional layers, use "ReLU" activation function
    # and reduce the spatial size (width and height) with pool layers
    model.add(Conv2D(32, (3, 3), padding="same", input_shape=input_shape)) # 32 3x3 filters (height, width, depth)
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)

    # (CONV => RELU) * 2 => POOL layer set (increasing number of layers as you go deeper into CNN)
    model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)

    # (CONV => RELU) * 3 => POOL layer set (input volume size becoming smaller and smaller)
    model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)

    # only set of FC => RELU layers
    model.add(Flatten())
    model.add(Dense(512))
    model.add(Activation("relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))

    # sigmoid classifier (output layer)
    model.add(Dense(classes))
    model.add(Activation("sigmoid"))

Is there any glaring omissions or mistakes with this model or can I simply not solve this problem using deep learning (with my current GPU, a GTX 970)?

Here is my code for compiling/training the model:

# initialise the model and optimiser
print("[INFO] Training network...")
opt = SGD(lr=initial_lr, decay=initial_lr / epochs)
model.compile(loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

# set up checkpoints
model_name = "output/50_epochs_{epoch:02d}_{val_acc:.2f}.model"
checkpoint = ModelCheckpoint(model_name, monitor='val_acc', verbose=1, 
save_best_only=True, mode='max')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
                          patience=5, min_lr=0.001)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
callbacks_list = [checkpoint, reduce_lr, tensorboard]

# train the network
H = model.fit_generator(training_set, steps_per_epoch=500, epochs=50, validation_data=test_set, validation_steps=150, callbacks=callbacks_list)

Mukul · Accepted Answer

I suggest you go for transfer learning instead of training the whole network. use the weights trained on a huge Dataset like ImageNet

you can easily do this using Keras you just need to import model with weights like xception and remove last layer which represents 1000 classes of imagenet dataset to 2 node dense layer cause you have only 2 classes and set trainable=False for the base layer and trainable=True for custom added layers like dense layer having node = 2.

and you can train the model as usual way.

Demo code -

from keras.applications import *
from keras.models import Model

base_model = Xception(input_shape=(img_width, img_height, 3), weights='imagenet', include_top=False
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(base_model.input, predictions)
# freezing the base layer weights
for layer in base_model.layers:
    layer.trainable = False

CNN architecture: classifying "good" and "bad" images

Answers (2)

Related Questions

CNN architecture: classifying &quot;good&quot; and &quot;bad&quot; images

Answers (2)

Related Questions

CNN architecture: classifying "good" and "bad" images