Reputation: 63
I am using the below LeNet architecture to train my image classification model , I have noticed that both train , val accuracy not improving for each iteration . Can any one expertise in this area explain what might have gone wrong ?
training samples - 110 images belonging to 2 classes. validation - 50 images belonging to 2 classes.
#LeNet
import keras
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
#import dropout class if needed
from keras.layers import Dropout
from keras import regularizers
model = Sequential()
#Layer 1
#Conv Layer 1
model.add(Conv2D(filters = 6,
kernel_size = 5,
strides = 1,
activation = 'relu',
input_shape = (32,32,3)))
#Pooling layer 1
model.add(MaxPooling2D(pool_size = 2, strides = 2))
#Layer 2
#Conv Layer 2
model.add(Conv2D(filters = 16,
kernel_size = 5,
strides = 1,
activation = 'relu',
input_shape = (14,14,6)))
#Pooling Layer 2
model.add(MaxPooling2D(pool_size = 2, strides = 2))
#Flatten
model.add(Flatten())
#Layer 3
#Fully connected layer 1
model.add(Dense(units=128,activation='relu',kernel_initializer='uniform'
,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))
#Layer 4
#Fully connected layer 2
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))
#layer 5
#Fully connected layer 3
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))
#layer 6
#Fully connected layer 4
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))
#Layer 7
#Output Layer
model.add(Dense(units = 2, activation = 'softmax'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
from keras.preprocessing.image import ImageDataGenerator
#Image Augmentation
train_datagen = ImageDataGenerator(
rescale=1./255, #rescaling pixel value bw 0 and 1
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
#Just Feature scaling
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory(
'/Dataset/Skin_cancer/training',
target_size=(32, 32),
batch_size=32,
class_mode='categorical')
test_set = test_datagen.flow_from_directory(
'/Dataset/Skin_cancer/testing',
target_size=(32, 32),
batch_size=32,
class_mode='categorical')
model.fit_generator(
training_set,
steps_per_epoch=50, #number of input (image)
epochs=25,
validation_data=test_set,
validation_steps=10) # number of training sample
Epoch 1/25
50/50 [==============================] - 52s 1s/step - loss: 0.8568 - accuracy: 0.4963 - val_loss: 0.7004 - val_accuracy: 0.5000
Epoch 2/25
50/50 [==============================] - 50s 1s/step - loss: 0.6940 - accuracy: 0.5000 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 3/25
50/50 [==============================] - 48s 967ms/step - loss: 0.6932 - accuracy: 0.5065 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 4/25
50/50 [==============================] - 50s 1s/step - loss: 0.6932 - accuracy: 0.4824 - val_loss: 0.6933 - val_accuracy: 0.5000
Epoch 5/25
50/50 [==============================] - 49s 974ms/step - loss: 0.6932 - accuracy: 0.4949 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 6/25
50/50 [==============================] - 51s 1s/step - loss: 0.6932 - accuracy: 0.4854 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 7/25
50/50 [==============================] - 49s 976ms/step - loss: 0.6931 - accuracy: 0.5015 - val_loss: 0.6918 - val_accuracy: 0.5000
Epoch 8/25
50/50 [==============================] - 51s 1s/step - loss: 0.6932 - accuracy: 0.4986 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 9/25
50/50 [==============================] - 49s 973ms/step - loss: 0.6932 - accuracy: 0.5000 - val_loss: 0.6929 - val_accuracy: 0.5000
Epoch 10/25
50/50 [==============================] - 50s 1s/step - loss: 0.6931 - accuracy: 0.5044 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 11/25
50/50 [==============================] - 49s 976ms/step - loss: 0.6931 - accuracy: 0.5022 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 12/25
Upvotes: 2
Views: 5948
Reputation:
Most importantly is that you are using loss = 'categorical_crossentropy'
, change it to loss = 'binary_crossentropy'
as you have just 2 classes. And also change class_mode='categorical'
to class_mode='binary'
in flow_from_directory
.
As @desertnaut rightly mentioned, categorical_crossentropy
goes hand in hand with softmax
activation in the last layer, and if you change the loss to binary_crossentropy
the last activation should also be changed to sigmoid
.
Other Improvements:
horizontal_flip
, vertical_flip
, shear_range
, zoom_range of ImageDataGenerator to increase the number of training and validation images.Moving the comments to answer section as suggested by @desertnaut -
Question - Thanks ! Yes , less data is the problem I figured . One additional question - why is that adding more dense layer than conv layer negatively affecting the model, is there any rule to follow when we decide how many conv and dense layer we gonna use ? – Arun_Ramji_Shanmugam 2 days ago
Answer - To answer the first part of your question, Conv2D layer maintains the spatial information of the image and weights to be learnt depend on the kernel size and stride mentioned in the layer,where as the Dense layer needs the output of Conv2D to be flattened and used further hence losing the spatial information. Also dense layer adds more number of weights, for example 2 dense layers of 512 adds (512*512)=262144 params or weights to the model(has to be learnt by the model).That means you have to train for more number of epochs and with good hype parameters settings for learning of these weights. – Tensorflow Warriors 2 days ago
Answer - To answer the second part of your question,use systematic experiments to discover what works best for your specific dataset. Also it depends on processing power you hold. Remember, deeper networks is always better, at the cost of more data and increased complexity of learning. A conventional approach is to look for similar problems and deep learning architectures which have already been shown to work. Also we have the flexibility to utilize the pretrained models like resnet, vgg etc, use these models by freezing the part of the layers and training on remaining layers. – Tensorflow Warriors 2 days ago
Question - Thank you for detailed answer !! If you don't bother one more question - so when we are using already trained model (may be some layers) , isn't it required to be trained on same input data as the one we gonna work ? – Arun_Ramji_Shanmugam yesterday
Answer - The intuition behind transfer learning for image classification is that if a model is trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world. You can find transfer learning example with explanation here - tensorflow.org/tutorials/images/transfer_learning . – Tensorflow Warriors yesterday
Upvotes: 4
Reputation: 60399
Remove all kernel_initializer='uniform'
arguments from your layers; don't specify anything here, the default initializer glorot_uniform
is the highly recommended one (and the uniform
is a particularly bad one).
As a general rule, keep in mind that the default values for such rather advanced settings are there for your convenience, they are implicitly recommended, and you should better not mess with them unless you have specific reasons to do so and you know exactly what you are doing.
For the kernel_initializer
argument in particular, I have started believing that it has caused a lot of unnecessary pain to people (just see here for the most recent example).
Also, dropout should not be used by default, especially in cases like here where the model seems to struggle to learn anything; start without any dropout (comment out the respective layers), and only add it back if you see signs of overfitting.
Upvotes: 4