Reputation: 7440
I am trying to train a keras CNN against the Street View House Numbers Dataset. You can find the project here. The problem is that during training neither loss nor accuracy change over time. I have tried with 1 Channel (Gray Scale) images, with RGB (3 channels) images, with wider (50,50) and smaller (28,28) images, with more or less filters in the convolutional layers, with wider and smaller patches in the pooling layers, with and without dropout, with bigger and smaller batches, with smaller and bigger learning step for the optimizers, with different optimizers, ...
Still the training gets stuck to constant loss and accuracy
Here is how I prepared the data
from PIL import Image
from PIL import ImageFilter
train_folders = 'sv_train/train'
test_folders = 'test'
extra_folders = 'extra'
SV_IMG_SIZE = 28
SV_CHANNELS = 3
train_imsize = np.ndarray([len(train_data),2])
k = 500
sv_images = []
max_images = 20000#len(train_data)
max_digits = 5
sv_labels = np.ones([max_images, max_digits], dtype=int) * 10 # init to 10 cause it would be no digit
nboxes = [[] for i in range(max_images)]
print ("%d to load" % len(train_data))
def getBBox(i,perc):
boxes = train_data[i]['boxes']
x_min=9990
y_min=9990
x_max=0
y_max=0
for bid,b in enumerate(boxes):
x_min = b['left'] if b['left'] <= x_min else x_min
y_min = b['top'] if b['top'] <= y_min else y_min
x_max = b['left']+b['width'] if b['left']+b['width'] >= x_max else x_max
y_max = b['top']+b['height'] if b['top']+b['height'] >= y_max else y_max
dy = y_max-y_min
dx = x_max-x_min
dpy = dy*perc
dpx = dx*perc
nboxes[i]=[dpx,dpy,dx,dy]
return x_min-dpx, y_min-dpy, x_max+dpx, y_max+dpy
for i in range(max_images):
print (" \r%d" % i ,end="")
filename = train_data[i]['filename']
fullname = os.path.join(train_folders, filename)
boxes = train_data[i]['boxes']
label = [10,10,10,10,10]
lb = len(boxes)
if lb <= max_digits:
im = Image.open(fullname)
x_min, y_min, x_max, y_max = getBBox(i,0.3)
im = im.crop([x_min,y_min,x_max,y_max])
owidth, oheight = im.size
wr = SV_IMG_SIZE/float(owidth)
hr = SV_IMG_SIZE/float(oheight)
for bid,box in enumerate(boxes):
sv_labels[i][max_digits-lb+bid] = int(box['label'])
box = nboxes[i]
box[0]*=wr
box[1]*=wr
box[2]*=hr
box[3]*=hr
im = im.resize((SV_IMG_SIZE,SV_IMG_SIZE),Image.ANTIALIAS)
array = np.asarray(im)
array = array.reshape((SV_IMG_SIZE,SV_IMG_SIZE,SV_CHANNELS)).astype(np.float32)
na = np.zeros([SV_IMG_SIZE,SV_IMG_SIZE,SV_CHANNELS],dtype=int)
sv_images.append(array.astype(np.float32))
Here is the model
from keras.optimizers import Adam
from keras.utils.np_utils import to_categorical
adam = Adam(lr=0.5)
model = Sequential()
x = Input((SV_IMG_SIZE, SV_IMG_SIZE,SV_CHANNELS))
y = Convolution2D(16, 3, 3, activation='relu', border_mode='same')(x)
y = Convolution2D(32, 3, 3, activation='relu', border_mode='valid')(y)
y = MaxPooling2D((2, 2))(y)
y = Convolution2D(128, 3, 3, activation='relu', border_mode='valid')(y)
y = MaxPooling2D((2, 2))(y)
y = Flatten()(y)
y = Dense(512, activation='relu')(y)
digit1 = Dense(11, activation="softmax")(y)
digit2 = Dense(11, activation="softmax")(y)
digit3 = Dense(11, activation="softmax")(y)
digit4 = Dense(11, activation="softmax")(y)
digit5 = Dense(11, activation="softmax")(y)
model = Model(input=x, output=[digit1, digit2, digit3,digit4,digit5])
model.compile(optimizer=adam,
loss='categorical_crossentropy',
metrics=['accuracy'])
sv_train_labels = [to_categorical(svt_labels[:,0]),
to_categorical(svt_labels[:,1]),
to_categorical(svt_labels[:,2]),
to_categorical(svt_labels[:,3]),
to_categorical(svt_labels[:,4])]
sv_validation_labels = [to_categorical(svv_labels[:,0]),
to_categorical(svv_labels[:,1]),
to_categorical(svv_labels[:,2]),
to_categorical(svv_labels[:,3]),
to_categorical(svv_labels[:,4])]
model.fit(sv_train, sv_train_labels, nb_epoch=50, batch_size=8,validation_data=(sv_validation, sv_validation_labels))
Upvotes: 4
Views: 6403
Reputation: 3842
As my comment above, I'd suggest to avoid training a model to predict 5 digits combination. It would be far more efficient to train the model to predict a single number. I tried to build quick example based on Keras example cifar10_cnn.py on MNIST SHVN format 2 (cropped digits):
import numpy as np
import scipy.io as sio
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils.np_utils import to_categorical
# parameters
nb_epoch = 10
batch_size = 32
# load data
nb_classes = 10
train_data = sio.loadmat('train_32x32.mat')
test_data = sio.loadmat('test_32x32.mat')
X_train = train_data['X'].T / 255
X_test = test_data['X'].T / 255
y_train = to_categorical(train_data['y'] % nb_classes)
y_test = to_categorical(test_data['y'] % nb_classes)
# model
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same', input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# train
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test, y_test), shuffle=True)
Once you trained the model, train another model to recognize/extract each number from an image using library such as OpenCV
Upvotes: 2
Reputation: 11543
why are the labels for 104 : [10 10 1 10 4]
? I believe it should be [10 10 1 0 4]
, no?
In my opinion : either you have a problem with the input data (the preparation might be wrong), or you have an architecture not suitable for this problem.
It is training, you can see on the notebook there is a change of loss between epoch 1 and 2. So it's not a training issue.
Upvotes: 0
Reputation: 7148
In cases like this it is most of the time a wrong training set. I would recommend you to take a look at the actual images and labels you feed into the network. Additionally look at the actual colorbar of the images. This means seeing how their values are distributed. This often leads to the solution. Anyways, if you are able to map them, then so will the computer given a good learning rate.
Upvotes: 1