Reputation: 139
I am running a CNN for left and right shoeprint classfication. I have 190,000 training images and I use 10% of it for validation. My model is setup as shown below. I get the paths of all the images, read them in and resize them. I normalize the image, and then fit it to the model. My issue is that I have stuck at a training accuracy of 62.5% and a loss of around 0.6615-0.6619. Is there something wrong that I am doing? How can I stop this from happening?
Just some interesting points to note:
I first tested this on 10 images I was having the same issue but changing the optimizer to adam and batch size to 4 worked.
I then tested on more and more images, but each time I would need to change the batch size to get improvements in the accuracy and loss. With 10,000 images I had to use a batch size of 500 and optimizer rmsprop. However, the accuracy and loss only really began to change after epoch 10.
I am now training on 190,000 images and I cannot increase the batch size as my GPU is at is max.
imageWidth = 50
imageHeight = 150
def get_filepaths(directory):
file_paths = []
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths
def cleanUpPaths(fullFilePaths):
cleanPaths = []
for f in fullFilePaths:
if f.endswith(".png"):
cleanPaths.append(f)
return cleanPaths
def getTrainData(paths):
trainData = []
for i in xrange(1,190000,2):
im = image.imread(paths[i])
im = image.imresize(im, (150,50))
im = (im-255)/float(255)
trainData.append(im)
trainData = np.asarray(trainData)
right = np.zeros(47500)
left = np.ones(47500)
trainLabels = np.concatenate((left, right))
trainLabels = np_utils.to_categorical(trainLabels)
return (trainData, trainLabels)
#create the convnet
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(imageWidth,imageHeight,1),strides=1))#32
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu',strides=1))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 3)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (1, 2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=['accuracy'])
#prepare the training data*/
trainPaths = get_filepaths("better1/train")
trainPaths = cleanUpPaths(trainPaths)
(trainData, trainLabels) = getTrainData(trainPaths)
trainData = np.reshape(trainData,(95000,imageWidth,imageHeight,1)).astype('float32')
trainData = (trainData-255)/float(255)
#train the convnet***
model.fit(trainData, trainLabels, batch_size=500, epochs=50, validation_split=0.2)
#/save the model and weights*/
model.save('myConvnet_model5.h5');
model.save_weights('myConvnet_weights5.h5');
Upvotes: 14
Views: 28048
Reputation: 13
This may help those coming here from Google:
You may want to check your tensorflow version.
I was having a bad time trying to train a CNN that, no matter what I tried, did not converge. I tried different learning rates, optimizers, architectures, you name it. At this point, I was using my local machine, in which I had set up a conda environment with tensorflow (using conda create -n tf-gpu tensorflow-gpu
), which installed tensorflow version 2.4.1.
Then, I decided to try training the same model with the same dataset on Google Colab, which had tf version 2.15.0 and right on the first try the model converged beautifully.
I'm not sure what caused this behavior.
Upvotes: 0
Reputation: 7148
I would try a couple of things. A lower learning rate should help with more data. Generally, adapting the optimizer should help. Additionally your network seems really small, you might want to increase the capacity of the model by adding layers or increasing the number of filters in the layers.
A better description on how to apply deep learning in practice is given here.
Upvotes: 4
Reputation: 71
You can try to add a BatchNornmalization()
layer after MaxPooling2D()
. It works for me.
Upvotes: 7
Reputation: 737
I just have 2 things more to add to the great list of DBCerigo.
linear
activation function by default, if you do not insert some non linearity into your model it wont be able to generalize, so the net will try to learn how to separate linearly a feature space that is not linear. Making sure you have your non linearity set is a good checkpoint.Although the 2nd one may be obvious, I run into his problem once and I lost lots of time checking everythin (data, batches, LR...) before figuring out.
Hope this helps
Upvotes: 4
Reputation: 15
in my case it is the activification function matters. I change from 'sgd' to 'a'
Upvotes: -4
Reputation: 635
I've had this issue a number of times now, so thought to make a little recap of it and possible solutions etc. to help people in the future.
Issue: Model predicts one of the 2 (or more) possible classes for all data it sees*
Confirming issue is occurring: Method 1: accuracy for model stays around 0.5 while training (or 1/n where n is number of classes). Method 2: Get the counts of each class in predictions and confirm it's predicting all one class.
Fixes/Checks (in somewhat of an order):
model.summary()
, inspect the model.ImageDataGenerator().flow_from_directory(PATH)
, check that param shuffle=True
and that batch_size
is greater than 1.for layer in pretrained_model.layers: layer.trainable = False
should be somewhere in your code.lr=1e-6
, so keep going!)If any of you know of more fixes/checks that could possible get the model training properly then please do contribute and I'll try to update the list.
**Note that is common to make more of the pretrained model trainable, once the new layers have been initially trained "enough"
*Other names for the issue to help searches get here... keras tensorflow theano CNN convolutional neural network bad training stuck fixed not static broken bug bugged jammed training optimization optimisation only 0.5 accuracy does not change only predicts one single class wont train model stuck on class model resetting itself between epochs keras CNN same output
Upvotes: 27