Convolutional Neural Network seems to be randomly guessing

Question

So I am currently trying to build a race recognition program using a convolution neural network. I'm inputting 200px by 200px versions of the UTKFaceRegonition dataset (put my dataset on a google drive if you want to take a look). Im using 8 different classes (4 races * 2 genders) using keras and tensorflow, each having about 700 images but I have done it with 1000. The problem is when I run the network it gets at best 13.5% accuracy and about 11-12.5% validation accuracy, with a loss around 2.079-2.081, even after 50 epochs or so it won't improve at all. My current hypothesis is that it is randomly guessing stuff/not learning because 8/100=12.5%, which is about what it is getting and on other models I have made with 3 classes it was getting about 33%

I noticed the validation accuracy is different on the first and sometimes second epoch, but after that it ends up staying constant. I've increased the pixel resolution, changed amount of layers, types of layer and neurons per layer, I've tried optimizers (sgd at the normal lr and at very large and small (.1 and 10^-6) and I've tried different loss functions like KLDivergence but nothing seems to have any effect on it except KLDivergence which on one run did pretty well (about 16%) but then it flopped again. Some ideas I had are maybe theres too much noise in the dataset or maybe it has to do with the amount of dense layers, but honestly I dont know why it is not learning.

Heres the code to make the tensors

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import os
import cv2
import random
import pickle

WIDTH_SIZE = 200
HEIGHT_SIZE = 200

CATEGORIES = []
for CATEGORY in os.listdir('./TRAINING'):
    CATEGORIES.append(CATEGORY)
DATADIR = "./TRAINING"
training_data = []
def create_training_data():
    for category in CATEGORIES:
        path = os.path.join(DATADIR, category)
        class_num = CATEGORIES.index(category)
        for img in os.listdir(path)[:700]:
            try:
                img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_COLOR)
                new_array = cv2.resize(img_array,(WIDTH_SIZE,HEIGHT_SIZE))
                training_data.append([new_array,class_num])
            except Exception as error:
                print(error)

create_training_data()

random.shuffle(training_data)
X = []
y = []

for features, label in training_data:
    X.append(features)
    y.append(label)

X = np.array(X).reshape(-1, WIDTH_SIZE, HEIGHT_SIZE, 3)
y = np.array(y)

pickle_out = open("X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out = open("y.pickle", "wb")
pickle.dump(y, pickle_out)

Heres my built model

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
import pickle

pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)
X = X/255.0

model = Sequential()
model.add(Conv2D(256, (2,2), activation = 'relu', input_shape = X.shape[1:]))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(8, activation="softmax"))

model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])

model.fit(X, y, batch_size=16,epochs=100,validation_split=.1)

Heres a log of 10 epochs I ran.

5040/5040 [==============================] - 55s 11ms/sample - loss: 2.0803 - accuracy: 0.1226 - val_loss: 2.0796 - val_accuracy: 0.1250
Epoch 2/100
5040/5040 [==============================] - 53s 10ms/sample - loss: 2.0797 - accuracy: 0.1147 - val_loss: 2.0798 - val_accuracy: 0.1161
Epoch 3/100
5040/5040 [==============================] - 53s 10ms/sample - loss: 2.0797 - accuracy: 0.1190 - val_loss: 2.0800 - val_accuracy: 0.1161
Epoch 4/100
5040/5040 [==============================] - 53s 11ms/sample - loss: 2.0797 - accuracy: 0.1173 - val_loss: 2.0799 - val_accuracy: 0.1107
Epoch 5/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1183 - val_loss: 2.0802 - val_accuracy: 0.1107
Epoch 6/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1226 - val_loss: 2.0801 - val_accuracy: 0.1107
Epoch 7/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1238 - val_loss: 2.0803 - val_accuracy: 0.1107
Epoch 8/100
5040/5040 [==============================] - 54s 11ms/sample - loss: 2.0797 - accuracy: 0.1169 - val_loss: 2.0802 - val_accuracy: 0.1107
Epoch 9/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1212 - val_loss: 2.0803 - val_accuracy: 0.1107
Epoch 10/100
5040/5040 [==============================] - 53s 11ms/sample - loss: 2.0797 - accuracy: 0.1177 - val_loss: 2.0802 - val_accuracy: 0.1107

So yeah, any help on why my network seems to be just guessing? Thank you!

Lukasz Tracewski · Accepted Answer

The problem lies in the design of you network.

Typically you'd want in the first layers to learn high-level features and use larger kernel with odd size. Currently you're essentially interpolating neighbouring pixels. Why odd size? Read e.g. here.
Number of filters typically increases from small (e.g. 16, 32) number to larger values when going deeper into the network. In your network all layers learn the same number of filters. The reasoning is that the deeper you go, the more fine-grained features you'd like to learn - hence increase in number of filters.
In your ANN each layer also cuts out valuable information from the image (by default you are using valid padding).

Here's a very basic network that gets me after 40 seconds and 10 epochs over 95% training accuracy:

import pickle
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D

pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)
X = X/255.0

model = Sequential()
model.add(Conv2D(16, (5,5), activation = 'relu', input_shape = X.shape[1:], padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(32, (3,3), activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3), activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(512))
model.add(Dense(8, activation='softmax'))

model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])

Architecture:

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_19 (Conv2D)           (None, 200, 200, 16)      1216      
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 100, 100, 16)      0         
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 100, 100, 32)      4640      
_________________________________________________________________
max_pooling2d_15 (MaxPooling (None, 50, 50, 32)        0         
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 50, 50, 64)        18496     
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 25, 25, 64)        0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 40000)             0         
_________________________________________________________________
dense_7 (Dense)              (None, 512)               20480512  
_________________________________________________________________
dense_8 (Dense)              (None, 8)                 4104      
=================================================================
Total params: 20,508,968
Trainable params: 20,508,968
Non-trainable params: 0

Training:

Train on 5040 samples, validate on 560 samples
Epoch 1/10
5040/5040 [==============================] - 7s 1ms/sample - loss: 2.2725 - accuracy: 0.1897 - val_loss: 1.8939 - val_accuracy: 0.2946
Epoch 2/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.7831 - accuracy: 0.3375 - val_loss: 1.8658 - val_accuracy: 0.3179
Epoch 3/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.4857 - accuracy: 0.4623 - val_loss: 1.9507 - val_accuracy: 0.3357
Epoch 4/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.1294 - accuracy: 0.6028 - val_loss: 2.1745 - val_accuracy: 0.3250
Epoch 5/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.8060 - accuracy: 0.7179 - val_loss: 3.1622 - val_accuracy: 0.3000
Epoch 6/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.5574 - accuracy: 0.8169 - val_loss: 3.7494 - val_accuracy: 0.2839
Epoch 7/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.3756 - accuracy: 0.8813 - val_loss: 4.9125 - val_accuracy: 0.2643
Epoch 8/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.3001 - accuracy: 0.9036 - val_loss: 5.6300 - val_accuracy: 0.2821
Epoch 9/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.2345 - accuracy: 0.9337 - val_loss: 5.7263 - val_accuracy: 0.2679
Epoch 10/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.1549 - accuracy: 0.9581 - val_loss: 7.3682 - val_accuracy: 0.2732

As you can see, validation score is terrible, but the point was to demonstrate that poor architecture can prevent training altogether.

Convolutional Neural Network seems to be randomly guessing

Answers (1)

Related Questions