Shin Yami
Shin Yami

Reputation: 9

Neural Network does not perform well on the CIFAR-10 dataset

I have been trying to implement a CNN on the CIFAR-10 dataset for a few days and my test set accuracy does not seem to go beyond the 10% and the error just hang around 69.07733. I have tweaking the model and few days but in vain. I haven't been able to spot out where I am going wrong. Please help me recognise the fault in the model. Here is the code for it:

import os
import sys
import pickle
import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt

data_root = './cifar-10-batches-py'
train_data = np.ndarray(shape=(50000,3072), dtype=np.float32)
train_labels = np.ndarray(shape=(50000), dtype=np.float32)
num_images = 0
test_data = np.ndarray(shape=(10000,3072),dtype = np.float32)
test_labels = np.ndarray(shape=(10000),dtype=np.float32)
meta_data = {}

for file in os.listdir(data_root):
    file_path = os.path.join(data_root,file)
    with open(file_path,'rb') as f:
        temp = pickle.load(f,encoding ='bytes')
        if file == 'batches.meta':
            for i,j in enumerate(temp[b'label_names']):
                meta_data[i] = j
        if 'data_batch_' in file:
            for i in range(10000):
                train_data[num_images,:] = temp[b'data'][i]
                train_labels[num_images] = temp[b'labels'][i]
                num_images += 1
        if 'test_batch' in file:
            for i in range(10000):
                test_data[i,:] = temp[b'data'][i]
                test_labels[i] = temp[b'labels'][i]



'''         
print('meta: \n',meta_data)
train_data = train_data.reshape(50000,3,32,32).transpose(0,2,3,1)
print('\ntrain data: \n',train_data.shape,'\nLabels: \n',train_labels[0])
print('\ntest data: \n',test_data[0].shape,'\nLabels: \n',train_labels[0])'''


#accuracy function acc = (no. of correct prediction/total attempts) * 100
def accuracy(predictions, labels):
    return (100 * (np.sum(np.argmax(predictions,1)== np.argmax(labels, 1))/predictions.shape[0]))

#reformat the data
def reformat(data,labels):
    data = data.reshape(data.shape[0],3,32,32).transpose(0,2,3,1).astype(np.float32)
    labels = (np.arange(10) == labels[:,None]).astype(np.float32)
    return data,labels


train_data, train_labels = reformat(train_data,train_labels)
test_data, test_labels = reformat(test_data, test_labels)
print ('Train ',train_data[0][1])

plt.axis("off")
plt.imshow(train_data[1], interpolation = 'nearest')
plt.savefig("1.png")
plt.show()

'''
print("Train: \n",train_data.shape,test_data[0],"\nLabels: \n",train_labels.shape,train_labels[:11])
print("Test: \n",test_data.shape,test_data[0],"\nLabels: \n",test_labels.shape,test_labels[:11])'''

image_size = 32
num_channels = 3
batch_size = 30
patch_size = 5
depth = 64
num_hidden = 256
num_labels = 10

graph = tf.Graph()

with graph.as_default():

    #input data and labels
    train_input = tf.placeholder(tf.float32,shape=(batch_size,image_size,image_size,num_channels))
    train_output = tf.placeholder(tf.float32,shape=(batch_size,num_labels))
    test_input = tf.constant(test_data)

    #layer weights and biases
    layer_1_weights = tf.Variable(tf.truncated_normal([patch_size,patch_size,num_channels,depth]))
    layer_1_biases = tf.Variable(tf.zeros([depth]))

    layer_2_weights = tf.Variable(tf.truncated_normal([patch_size,patch_size,depth,depth]))
    layer_2_biases = tf.Variable(tf.constant(0.1, shape=[depth]))

    layer_3_weights = tf.Variable(tf.truncated_normal([64*64, num_hidden]))
    layer_3_biases = tf.Variable(tf.constant(0.1, shape=[num_hidden]))

    layer_4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels]))
    layer_4_biases = tf.Variable(tf.constant(0.1, shape=[num_labels]))

    def convnet(data):
        conv_1 = tf.nn.conv2d(data, layer_1_weights,[1,1,1,1], padding = 'SAME')
        hidden_1 = tf.nn.relu(conv_1+layer_1_biases)
        norm_1 = tf.nn.lrn(hidden_1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
        pool_1 = tf.nn.max_pool(norm_1,[1,2,2,1],[1,2,2,1], padding ='SAME')
        conv_2 = tf.nn.conv2d(pool_1,layer_2_weights,[1,1,1,1], padding = 'SAME')
        hidden_2 = tf.nn.relu(conv_2+layer_2_biases)
        norm_2 = tf.nn.lrn(hidden_2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
        pool_2 = tf.nn.max_pool(norm_2,[1,2,2,1],[1,2,2,1], padding ='SAME')
        shape = pool_2.get_shape().as_list()
        hidd2_trans = tf.reshape(pool_2,[shape[0],shape[1]*shape[2]*shape[3]])
        hidden_3 = tf.nn.relu(tf.matmul(hidd2_trans,layer_3_weights) + layer_3_biases)
        return tf.nn.relu(tf.matmul(hidden_3,layer_4_weights) + layer_4_biases)

    logits = convnet(train_input)
    loss = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=train_output, logits = logits))

    optimizer = tf.train.AdamOptimizer(1e-4).minimize(loss)

    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(convnet(test_input))


num_steps = 100000


with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized \n')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch = train_data[offset:(offset+batch_size),:,:,:]
        batch_labels = train_labels[offset:(offset+batch_size),:]
        feed_dict ={train_input: batch, train_output: batch_labels}
        _,l,prediction = session.run([optimizer, loss, train_prediction], feed_dict = feed_dict)
        if (step % 500 == 0):
            print("Loss at step %d: %f" %(step, l))
            print("Accuracy: %f" %(accuracy(prediction, batch_labels)))
    print("Test accuracy: %f" %(accuracy(session.run(test_prediction), test_labels)))

Upvotes: 0

Views: 916

Answers (2)

Prasad
Prasad

Reputation: 6034

Problem is your network is having very high depth(number of filters = 64 for both layers). Also, you are training the network from scratch. And your dataset of CIFAR10 (50000 images) is very little. Moreover, each CIFAR10 image is only 32x32x3 size.

Couple of alternatives what I can suggest you is to retrain a pre-trained model, i.e do transfer learning.

Other better alternative is to reduce the number of filters in each layer. In this way, you will be able to train the model from scratch and also it will be faster. (Assuming you don't have GPU).

Next you are making use of local response normalization. I would suggest you to remove this layer and do mean normalization in pre-processing step.

Next, if you feel the learning is not picking up at all, try increasing the learning rate a little and see.

Lastly, just to reduce some operation in your code, you are reshaping your tensor and then doing transpose in many places like this:

data.reshape(data.shape[0],3,32,32).transpose(0,2,3,1)

Why not directly reshape it to something like this?

data.reshape(data.shape[0], 32, 32, 3)

Hope the answer helps you.

Upvotes: 0

Thomas Pinetz
Thomas Pinetz

Reputation: 7148

On a first glance I would say the initialization of the CNN is the culprit. A convnet is an optimization algorithm in a highly non-convex space and therefore depends a lot on careful initialization to not get stuck on local minima or saddle points. Look at xavier initialization for an example on how to fix that.

Example Code:

W = tf.get_variable("W", shape=[784, 256],
           initializer=tf.contrib.layers.xavier_initializer())

Upvotes: 0

Related Questions