Why is this CNN-script not predicting correctly?

Question

I am quite new to both Python and Machine Learning and I am working on my first real project for image recognition. It is based upon this tutorial which only has two classifications (cat or dog) and has a LOT more data. Nonetheless, I am not getting my multi-class script to work in terms of it predicting correctly but mainly how to troubleshoot the script. The script is nowhere near in predicting correctly.

Below is the script. The data/images consist of 7 folders with about 10-15 images each. The images are 100x100px of different domino tiles and one folder are just baby photos (mainly as a control group because they are very different to the domino photos):

from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.models import model_from_json
import numpy
import os

# Initialising the CNN
classifier = Sequential()

# Step 1 - Convolution
classifier.add(Conv2D(32, (25, 25), input_shape = (100, 100, 3), activation = 'relu'))

# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Adding a second convolutional layer
classifier.add(Conv2D(32, (25, 25), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Step 3 - Flattening
classifier.add(Flatten())

# Step 4 - Full connection
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 7, activation = 'sigmoid')) # 7 units equals amount of output categories

# Compiling the CNN
classifier.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])


# Part 2 - Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('dataset/training_set',
    target_size = (100, 100),
    batch_size = 32,
    class_mode = 'categorical')
test_set = test_datagen.flow_from_directory('dataset/test_set',
    target_size = (100, 100),
    batch_size = 32,
    class_mode = 'categorical')
classifier.fit_generator(training_set,
    steps_per_epoch = 168,
    epochs = 35,
    validation_data = test_set,
    validation_steps = 3)
classifier.summary()

# serialize weights to HDF5
classifier.save_weights("dominoweights.h5")
print("Saved model to disk")

# Part 3 - Making new predictions
import numpy as np
from keras.preprocessing import image

path = 'dataset/prediction_images/' # Folder with my images
for filename in os.listdir(path):
  if "jpg" in filename:
    test_image = image.load_img(path + filename, target_size = (100, 100))
    test_image = image.img_to_array(test_image)
    test_image = np.expand_dims(test_image, axis = 0)
    result = classifier.predict(test_image)
    print result
    training_set.class_indices
    folder = training_set.class_indices.keys()[(result[0].argmax())] # Get the index of the highest predicted value
    if folder == '1':
      prediction = '1x3'
    elif folder == '2':
      prediction = '1x8'
    elif folder == '3':
      prediction = 'Baby'
    elif folder == '4':
      prediction = '5x7'
    elif folder == '5':
      prediction = 'Upside down'
    elif folder == '6':
      prediction = '2x3'   
    elif folder == '7':
      prediction = '0x0'
    else:
      prediction = 'Unknown'
    print "Prediction: " + filename + " seems to be " + prediction
  else:
    print "DSSTORE"
  print "
"

Explanations:

Training data: about 10-15 images each in each category. In total there are 168 training images
Test data: 3 images each in each category
dataset/prediction_images/ contains about 10 different images that the script will predict
result typically outputs array([[0., 0., 1., 0., 0., 0., 0.]], dtype=float32)

My question(s)

My main question is: Do you see anything particularly wrong with the script? Or, should the script be working fine and that it's just the lack of data that makes the prediction wrong?

Subquestions:

Am I understanding the convolution layer(s) correctly that there is a 25x25px window that scans the images. I tried the "default" 3x3px but with the same result?
The number 32 in the convolution layer. Does it refer to 32 bit images?
Is it normal to have 2 convolution layers? I can't really see why it is needed.
The entire section with:

classifier.fit_generator(training_set,
steps_per_epoch = 168,
epochs = 35,
validation_data = test_set,
validation_steps = 3)

puzzles me. As far as I understood, steps_per_epoch should be the number of training images I have. Is that correct? Are epochs the amount of iterations the CNN does?

I don't see why this code is needed:

from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)

it seems to me that it is creating copies/versions of the images, zoom in on them, flips them etc. Why would this be needed?

Any tips on this would help me immensely!

Daniel M&#246;ller · Accepted Answer

The code doesn't seem to have anything clearly wrong, but filters of size (25,25) may be somewhat not good.

There are two possibilities:

Train metrics are great, but test metrics are bad: your model is overfitting (it may be due to little data)
Train metrics are not good: your model is not good enough

Subquestions:

1 - Yes, you're using filters that are windows sized (25,25) that slide along the input images. The bigger your filters, the less general they can be.

2 - The number 32 refers to how many output "channels" you want for this layer. While your input images have 3 channels, red layer, green layer and blue layer, these convolution layers will produce 32 different channels. The meaning of each channel is up to the hidden mathematics we can't see.

The number of channels is totally independent from anything.
The only restrictions are: input channels are 3, output classes are 7.

3 - It's normal to have "a lot" of convolutional layers, one over another. Some well known models have more than 10 convolutional layers.

Why is it needed? Each convolutional layer is interpreting the results of the previous layer, and producing new results. It's more power to the model. One may be too few.

4 - Generators produce batches with shape (batch_size,image_side1, image_side2, channels).

steps_per_epoch is necessary because the generators used are infinite (so keras doesn't know when to stop)
Usually, one uses steps_per_epoch = total_images//batch_size, so one epoch will use exactly all images. But you can play with these numbers as you wish
Usually, one epoch is one iteration through the entire dataset. (But with generators and steps_per_epoch, that is up to the user)

5 - The image data generator, besides loading data from your folders and making the classes for you, is also a tool for data augmentation.

If you have too little data, your model will overfit (excellent train results, terrible test results).
Machine learning needs tons of data to work well
Data augmentation is a way of creating more data when you don't have enough
- A shifted, flipped, elongated, etc. image, in the vision of a model, is totally new
- A model can learn cats looking to the right and yet not learn cats looking to the left, for instance

Why is this CNN-script not predicting correctly?

Answers (1)

Related Questions