Josh Newham
Josh Newham

Reputation: 47

CNN architecture: classifying "good" and "bad" images

I'm researching the possibility of implementing a CNN in order to classify images as "good" or "bad" but am having no luck with my current architecture.

Characteristics that denote a "bad" image:

Would it be feasible to implement a neural network to classify images based on these characteristics or is it best left to a traditional algorithm that simply looks at the variance in brightness/contrast throughout an image and classifies it that way?

I have attempted training a CNN using the VGGNet architecture but I always seem to get a biased and unreliable model, regardless of the number of epochs or number of steps.

Examples:

"Good" Photo"Bad" Photo

My current model's architecture is very simple (as I am new to the whole machine learning world) but seemed to work fine with other classification problems, and I have modified it slightly to work better with this binary classification problem:

    # CONV => RELU => POOL layer set
    # define convolutional layers, use "ReLU" activation function
    # and reduce the spatial size (width and height) with pool layers
    model.add(Conv2D(32, (3, 3), padding="same", input_shape=input_shape)) # 32 3x3 filters (height, width, depth)
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)

    # (CONV => RELU) * 2 => POOL layer set (increasing number of layers as you go deeper into CNN)
    model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)

    # (CONV => RELU) * 3 => POOL layer set (input volume size becoming smaller and smaller)
    model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)

    # only set of FC => RELU layers
    model.add(Flatten())
    model.add(Dense(512))
    model.add(Activation("relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))

    # sigmoid classifier (output layer)
    model.add(Dense(classes))
    model.add(Activation("sigmoid"))

Is there any glaring omissions or mistakes with this model or can I simply not solve this problem using deep learning (with my current GPU, a GTX 970)?

Here is my code for compiling/training the model:

# initialise the model and optimiser
print("[INFO] Training network...")
opt = SGD(lr=initial_lr, decay=initial_lr / epochs)
model.compile(loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

# set up checkpoints
model_name = "output/50_epochs_{epoch:02d}_{val_acc:.2f}.model"
checkpoint = ModelCheckpoint(model_name, monitor='val_acc', verbose=1, 
save_best_only=True, mode='max')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
                          patience=5, min_lr=0.001)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
callbacks_list = [checkpoint, reduce_lr, tensorboard]

# train the network
H = model.fit_generator(training_set, steps_per_epoch=500, epochs=50, validation_data=test_set, validation_steps=150, callbacks=callbacks_list)

Upvotes: 1

Views: 2216

Answers (2)

desertnaut
desertnaut

Reputation: 60370

Independently of any other advice (including the answer already provided), and assuming classes=2 (which you don't clarify - there is a reason we ask for a MCVE here), you seem to perform a fundamental mistake in your final layer, i.e.:

# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))

A sigmoid activation is suitable only if your final layer consists of a single node; if classes=2, as I suspect, based also on your puzzling statement in the comments that

with three different images, my results are 0.987 bad and 0.999 good

and

I was giving you the predictions from the model previously

you should use a softmax activation, i.e.

model.add(Dense(classes))
model.add(Activation("softmax"))

Alternatively, you could use sigmoid, but your final layer should consist of a single node, i.e.

model.add(Dense(1))
model.add(Activation("sigmoid"))

The latter is usually preferred in binary classification settings, but the results should be the same in principle.

UPDATE (after updating the question):

sparse_categorical_crossentropy is not the correct loss here, either.

All in all, try the following changes:

model.compile(loss="binary_crossentropy", optimizer=Adam(), metrics=["accuracy"])

# final layer:
model.add(Dense(1))
model.add(Activation("sigmoid"))

with Adam optimizer (needs import). Also, dropout should not be used by default - see this thread; start without it and only add if necessary (i.e. if you see signs of overfitting).

Upvotes: 5

Mukul
Mukul

Reputation: 890

I suggest you go for transfer learning instead of training the whole network. use the weights trained on a huge Dataset like ImageNet

you can easily do this using Keras you just need to import model with weights like xception and remove last layer which represents 1000 classes of imagenet dataset to 2 node dense layer cause you have only 2 classes and set trainable=False for the base layer and trainable=True for custom added layers like dense layer having node = 2.

and you can train the model as usual way.

Demo code -

from keras.applications import *
from keras.models import Model

base_model = Xception(input_shape=(img_width, img_height, 3), weights='imagenet', include_top=False
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(base_model.input, predictions)
# freezing the base layer weights
for layer in base_model.layers:
    layer.trainable = False

Upvotes: 3

Related Questions