keras stuck during optimization

Question

After trying the Keras example on CIFAR10, I decided to go for something bigger : a VGG-like net on the Tiny Imagenet dataset. This is a subset of the ImageNet dataset with 200 classes (instead of 1000) and 100K images downscaled to 64x64.

I got the VGG-like model from the file vgg_like_convnet.py here. Unfortunately, things are going pretty much like here except that this time changing the learning rate or swapping TH for TF does not help. Neither changing the optimizer (see code below).

Accuracy is basically stuck at 0.005 which, as it was pointed out, is what you would expected for completely random answer with 200 classes. Worse, if, by a fluke of weights init, it starts at, say, 0.007, it will quickly converges to 0.005 and firmly stays there for any subsequent epoch.

The Keras code (TH version) is below :

from __future__ import print_function
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.regularizers import l2, activity_l2, l1, activity_l1
from keras.optimizers import SGD, Adam, Adagrad, Adadelta
from keras.utils import np_utils
import numpy as np
import cPickle as pickle

# seed = 7
# np.random.seed(seed)

batch_size = 64
nb_classes = 200
nb_epoch = 30

# input image dimensions
img_rows, img_cols = 64, 64
# the tiny image net images are RGB
img_channels = 3

# Load the train dataset for TH
print('Load training data')
X_train=pickle.load(open('xtrain_shu_th.p','rb')) # np.zeros((100000,3,64,64)).astype('uint8')
y_train=pickle.load(open('ytrain_shu_th.p','rb')) # np.zeros((100000,1)).astype('uint8')

# Load the test dataset for TH
print('Load validation data')
X_test=pickle.load(open('xtest_th.p','rb')) # np.zeros((10000,3,64,64)).astype('uint8')
y_test=pickle.load(open('ytest_th.p','rb')) # np.zeros((10000,1)).astype('uint8')

# the data, shuffled and split between train and test sets
# (X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()

model.add(ZeroPadding2D((1,1),input_shape=(3,64,64)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu',))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_6'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_8'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_11'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_13'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_15'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_18'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_20'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_22'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(200, activation='softmax'))

# let's train the model using SGD + momentum (how original).

opt = SGD(lr=0.0001, decay=1e-6, momentum=0.7, nesterov=True)
# opt= Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
# opt = Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)
# opt = Adagrad(lr=0.01, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

print('Optimization....')
model.fit(X_train, Y_train,
          batch_size=batch_size,
          nb_epoch=nb_epoch,
          validation_data=(X_test, Y_test),
          shuffle=True)

# Save the resulting model
model.save('model.h5')

The Tiny Imagenet dataset consists of JPEG images that I converted to PPM with djpeg. I then created a large binary file containing, for each image, the class label (1 byte) followed by (64x64x3 bytes).

Reading this file from Keras was excruciatingly slow. So (I'm very new to Python, it might sound dumb to you), I decided to init a 4D Numpy array (100000,3,64,64) (for TH, (100000,64,64,3) for TF) with the dataset and pickle it. It now takes ~40s to load the dataset in the array when I run the code above.

I even checked that the pickled array contained the data in the right order with the code below:

import numpy as np
import cPickle as pickle

print("Reading data")
pix=pickle.load(open('xtrain_th.p','rb'))
print("Done")

img=67857

f=open('img'+str(img)+'.ppm','wb')
f.write('P6
64 64
255
')

for y in range(0,64):
    for x in range(0,64):
        f.write(chr(pix[img][0][y][x]))
        f.write(chr(pix[img][1][y][x]))
        f.write(chr(pix[img][2][y][x]))
f.close()

This extracts PPM images back from the dataset.

Finally, I noticed that the training dataset was too ordered (i.e. the first 500 images all belonged to class 0, the second 500 to class 1, etc. etc.)

So I shuffled them with the code below:

# Dataset preparation for Theano backend
import cPickle as pickle
import numpy as np
import random as rnd

n=100000

print('Load training data')
X_train=pickle.load(open('xtrain_th.p','rb')) # np.zeros((100000,3,64,64)).astype('uint8')
y_train=pickle.load(open('ytrain_th.p','rb')) # np.zeros((100000,1)).astype('uint8')

tmpa=np.zeros((3,64,64)).astype('uint8')

# Shuffle the data
print('Shuffling training data')
for _ in range(0,n):
    i=rnd.randrange(n)
    j=rnd.randrange(n)
    tmpa=X_train[i]
    X_train[i]=X_train[j];
    X_train[j]=tmpa
    tmp=y_train[i][0]
    y_train[i][0]=y_train[j][0]
    y_train[j][0]=tmp

print 'Pickle dump'
pickle.dump(X_train,open('xtrain_shu_th.p','wb'))
pickle.dump(y_train,open('ytrain_shu_th.p','wb'))

Nothing helped. I wasn't expecting 99% accuracy at the first attempt, but at least some movement and then plateau.

I wanted to try TFLearn, but it had a pending bug when I looked a few days ago.

Any ideas ? Thanks in advance

Thomas Pinetz · Accepted Answer

You can use the build in shuffle of the keras model API (https://keras.io/models/model/#fit). Just set the shuffle parameter to true. You can do both batch shuffle and global shuffle. The default is global shuffle.

One thing to note though is that the validation split in fit is done before the shuffling takes place. Therefore in case you want to shuffle your validation data too I would advise you to use: sklearn.utils.shuffle. (http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html)

From github:

if shuffle == 'batch':
    index_array = batch_shuffle(index_array, batch_size)              
elif shuffle:
    random.shuffle(index_array)

keras stuck during optimization

Answers (1)

Related Questions