castelisienne
castelisienne

Reputation: 3

Convolutional Neural Network : Issue with training, why do the biases arrays evolve through training but not the weight matrices?

I wanted to test the implementation of a convolutional nn with 2 conv layers and two fully connected layers. The simple neural network model works fine for me but I have an issue when I add the conv layers. Initially I wanted to tune different hyperparameters to optimize the performance of the model. To try to understand why the training was not working (the validation accuracy staying around the value 0.1), I also added visualization through TensorBoard.

When I run the following code, with just one set of hyperparameters, the model isn't really training because the accuracy never increases. However, I was able to see with TensorBoard that all my variables were initialized, and the biaises were updated but not the weight matrices for the different layers.

This is what I have with TensorBoard:

result of the training

I really don't understand why the model struggles to update the weights. I know it can sometimes come from the initialization but I think I used the right options, right?

If you have any idea where the bug would be I'd be really interested !

PS : the code isn't the most elegant but when I saw it wasn't working I wanted it to be as simple as possible

from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

LOGDIR = 'tensorboard_claire/tuning2'

patch_size = 5
kernel_size = 2
depth = 16
num_hidden = 64

def generate_hyperparameters():
# Randomly choose values for the hyperparameters.
    return {"learning_rate": 10 ** np.random.uniform(-3, -1),
            "batch_size": np.random.randint(1, 100),
            "dropout": np.random.uniform(0, 1),
            "stddev": 10 ** np.random.uniform(-4, 2)}

pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
save = pickle.load(f)
train_dataset = save['train_dataset']
train_labels = save['train_labels']
valid_dataset = save['valid_dataset']
valid_labels = save['valid_labels']
test_dataset = save['test_dataset']
test_labels = save['test_labels']
del save  # hint to help gc free up memory
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

image_size = 28
num_labels = 10
num_channels = 1 # grayscale

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size, image_size, 
    num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

def conv_layer(data, weights, biases):    
  conv = tf.nn.conv2d(data, weights, [1, 2, 2, 1], padding='SAME')
  hidden = tf.nn.relu(conv + biases)
  pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

  return pool

def reshape_drop(data):
  shape = data.get_shape().as_list()
  reshape = tf.reshape(data, [shape[0], shape[1] * shape[2] * shape[3]])
  return reshape

def train_cnn_and_compute_accuracy(hyperparameters, name='train'):
# Construct a deep network, train it, and return the accuracy on the
# validation data.
  batch_size = hyperparameters["batch_size"]
  std = hyperparameters["stddev"]

  graph = tf.Graph()
  with graph.as_default():   
    # Input data.
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)

    # Variables

    weights = {
       'conv1' : tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=std), name='convw1'),
       'conv2' : tf.Variable(tf.random_normal([patch_size, patch_size, depth, depth], stddev=std), name='convw2'),
       'fc1' : tf.Variable(tf.random_normal([2 * 2 * depth, num_hidden], stddev=std), name='fcw1'),
       'fc2' : tf.Variable(tf.random_normal([num_hidden, num_labels], stddev=std), name='fcw2')
       }

   biases = {
       'conv1' : tf.Variable(tf.zeros([depth]), name='convb1'),
       'conv2' : tf.Variable(tf.constant(1.0, shape=[depth]), name='convb2'),
       'fc1' : tf.Variable(tf.constant(1.0, shape=[num_hidden]), name='fcb1'),
       'fc2' : tf.Variable(tf.constant(1.0, shape=[num_labels]), name='fcb2')
       }

   # Neural network model with 2 convolutional layers and 2 fully connected layers
   # with max pooling and dropout

   with tf.name_scope("1st_conv_layer"):
       conv_1_train = conv_layer(tf_train_dataset, weights['conv1'], biases['conv1'])
       conv_1_valid = conv_layer(tf_valid_dataset, weights['conv1'], biases['conv1'])

       tf.summary.histogram("convw1", weights['conv1'])
       tf.summary.histogram("convb1", biases['conv1'])

    with tf.name_scope("2nd_conv_layer"):
        conv_2_train = conv_layer(conv_1_train, weights['conv2'], biases['conv2'])
        conv_2_valid = conv_layer(conv_1_valid, weights['conv2'], biases['conv2'])

        tf.summary.histogram("convw2", weights['conv2'])
        tf.summary.histogram("convb2", biases['conv2'])

    with tf.name_scope('dropout'):
        dropped_train = tf.nn.dropout(conv_2_train, hyperparameters["dropout"])
        dropped_valid = tf.nn.dropout(conv_2_valid, hyperparameters["dropout"])
        reshape_train = reshape_drop(dropped_train)
        reshape_valid = reshape_drop(dropped_valid)

    with tf.name_scope("1st_fc_layer"):
        fc1_train = tf.nn.relu(tf.matmul(reshape_train, weights['fc1']) + biases['fc1'])
        fc1_valid = tf.nn.relu(tf.matmul(reshape_valid, weights['fc1']) + biases['fc1'])

        tf.summary.histogram("fcw1", weights['fc1'])
        tf.summary.histogram("fcb1", biases['fc1'])

    with tf.name_scope("2nd_fc_layer"):
        fc2_train = tf.nn.relu(tf.matmul(fc1_train, weights['fc2']) + biases['fc2'])
        fc2_valid = tf.nn.relu(tf.matmul(fc1_valid, weights['fc2']) + biases['fc2'])

        tf.summary.histogram("fcw2", weights['fc2'])
        tf.summary.histogram("fcb2", biases['fc2'])

    # Predictions

    logits = fc2_train
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(fc2_valid)

    # Loss with or without regularization
    with tf.name_scope('xentropy'):
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
        tf.summary.scalar("xent", loss)

    # Decaying learning rate and GradientDescent optimizer

    with tf.name_scope('train'):
        global_step = tf.Variable(0, trainable=False)
        learning_rate = tf.train.exponential_decay(hyperparameters["learning_rate"], global_step, 100, 0.96, staircase=True)
        tf.summary.scalar("learning_rate", learning_rate)
        optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step=global_step)

    with tf.name_scope("valid_accuracy"):
        correct_prediction = tf.equal(tf.argmax(valid_prediction, 1), tf.argmax(valid_labels, 1))
    #Casts a tensor to a new type.
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        tf.summary.scalar("valid_accuracy", accuracy)

    num_steps = 1001
    val_acc = 0

    with tf.Session(graph=graph) as session:
        summ = tf.summary.merge_all()
        tf.global_variables_initializer().run()
        writer = tf.summary.FileWriter(LOGDIR+"/"+make_hparam_string(hyperparameters))
        writer.add_graph(session.graph)

        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions, summary = session.run([optimizer, loss, train_prediction, summ], feed_dict=feed_dict)

            if step in np.arange(0, num_steps, 70):
                print("Current step: " + str(step))
                val_acc = accuracy.eval()
                print("Validation accuracy : " + str(val_acc))

            if step % 5 == 0:
                writer.add_summary(summary, step)

    return val_acc

    session.close()
    writer.close()

def make_hparam_string(h):
    learning_rate = h["learning_rate"]
    batch_size = h["batch_size"]
    dropout = h["dropout"]
    stddev = h["stddev"]
    return ("lr_" + str(learning_rate) + ",dp_" + str(dropout) + ",batch_size_" + str(batch_size) + ",stddev_" + str(stddev))

# Generate a bunch of hyperparameter configurations.
hyperparameter_configurations = [generate_hyperparameters() for _ in range(5)]

# Launch some experiments.
results = []
for hyperparameters in hyperparameter_configurations:
    print("Hyperparameters : ", hyperparameters.values())
    acc = train_cnn_and_compute_accuracy(hyperparameters)
    results.append(acc)

Upvotes: 0

Views: 183

Answers (1)

asakryukin
asakryukin

Reputation: 2594

The code is a bit messy, but in any case the std of 100 is enormous, it should be around 0.1 and less. Next thing is that you should not use relu (or any other activation function) for the last layer before the soft max. Then dropout limits are quite wide as well, if you want to keep them, at least try to remove drop-out and make sure network can train without it (if you randomly get 0.1 your weights will hardly get updated) and return it afterwards.
Try to fix this first and if it doesn't help, we can look closer.

Upvotes: 3

Related Questions