Tai Christian
Tai Christian

Reputation: 665

How to create 2-layers neural network using TensorFlow and python on MNIST data

I'm a newbie in machine learning and I am following tensorflow's tutorial to create some simple Neural Networks which learn the MNIST data.

I have built a single layer network (following the tutotial), accuracy was about 0.92 which is ok for me. But then I added one more layer, the accuracy reduced to 0.113, which is very bad.

Below is the relation between 2 layers:

import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])

#layer 1
W1 = tf.Variable(tf.zeros([784, 100]))
b1 = tf.Variable(tf.zeros([100]))
y1 = tf.nn.softmax(tf.matmul(x, W1) + b1)

#layer 2
W2 = tf.Variable(tf.zeros([100, 10]))
b2 = tf.Variable(tf.zeros([10]))
y2 = tf.nn.softmax(tf.matmul(y1, W2) + b2)

#output
y = y2
y_ = tf.placeholder(tf.float32, [None, 10])

Is my structure fine? What is the reason that makes it perform so bad? How should I modify my network?

Upvotes: 6

Views: 9615

Answers (3)

Kåre Jonsson
Kåre Jonsson

Reputation: 31

I tried to run the code snippets above. Results below 90% was discarded and I never really felt sure I did what the comments above had. Here is my full code.

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])

#layer 1
W1 = tf.get_variable('w1', [784, 100], initializer=tf.random_normal_initializer())
b1 = tf.get_variable('b1', [1,], initializer=tf.random_normal_initializer())
y1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1) 

#layer 2
W2 = tf.get_variable('w2',[100,10], initializer= 
tf.random_normal_initializer())
b2 = tf.get_variable('b2',[1,], initializer=tf.random_normal_initializer())
y2 = tf.nn.softmax(tf.matmul(y1, W2) + b2)

#output
y = y2
y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), 
reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

for _ in range(10000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: 
mnist.test.labels}))

By changing 10000 -> 200000 i reached 95,5%.

Upvotes: 0

GabrielChu
GabrielChu

Reputation: 6156

Came into exactly the same problem, gradients diverged and got a bunch of nan for the predicted y. Implemented what suggested by nessuno, unfortunately, the diverging gradients still not fixed.

Instead I've tried sigmoid as the activation function for layer 1, it worked! But for relu didn't work if initiate W1 and W2 as zero matrices, accuracy is only 0.1135 . In order to make both relu and sigmoid work, better randomize the initialization of W1 and W2. Here's the modified code

import tensorflow as tf

x = tf.placeholder(tf.float32, [None, 784])

# layer 1
with tf.variable_scope('layer1'):
    W1 = tf.get_variable('w1',[784,200],
                         initializer=tf.random_normal_initializer())
    b1 = tf.get_variable('b1',[1,],
                         initializer=tf.constant_initializer(0.0))
    y1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
#   y1 = tf.nn.relu(tf.matmul(x, W1) + b1) # alternative choice for activation

# layer 2
with tf.variable_scope('layer2'):
    W2 = tf.get_variable('w2',[200,10],
                     initializer= tf.random_normal_nitializer())
    b2 = tf.get_variable('b2',[1,],
                         initializer=tf.constant_initializer(0.0))
    y2 = tf.nn.softmax(tf.matmul(y1, W2) + b2)

# output
y = y2
y_ = tf.placeholder(tf.float32, [None, 10])

I found this link is helpful, see question 2 part (c), which gives backpropagation derivatives for a basic 2-layer neural network. In my opinion, when users didn't specify any acivation function, just apply linear flow in layer 1, will end up with backprograting a gradient looks something like (sth)*W2^T*W1^T, and as we initilize both W1 and W2 to be zeros, their product is likely to be very small close to zero, which result in vanishing gradients.

UPDATE

This is from the Quora answer Ofir posted about good initial weights in a neural network.

The most common initializations are random initialization and Xavier initialization. Random initialization just samples each weight from a standard distribution (often a normal distribution) with low deviation. The low deviation allows you to bias the network towards the 'simple' 0 solution, without the bad repercussions of actually initializing the weights to 0.

Upvotes: 0

nessuno
nessuno

Reputation: 27042

The input of the 2nd layer is the softmax of the output of the first layer. You don't want to do that.

You're forcing the sum of these values to be 1. If some value of tf.matmul(x, W1) + b1 is about 0 (and some certainly are) the softmax operation is lowering this value to be 0. Result: you're killing the gradient and nothing can flow trough these neurons.

If you remove the softmax between the layers (but leve it the softmax on the output layer if you want to consider the values as probability) your network will work fine.

Tl;dr:

import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])

#layer 1
W1 = tf.Variable(tf.zeros([784, 100]))
b1 = tf.Variable(tf.zeros([100]))
y1 = tf.matmul(x, W1) + b1 #remove softmax

#layer 2
W2 = tf.Variable(tf.zeros([100, 10]))
b2 = tf.Variable(tf.zeros([10]))
y2 = tf.nn.softmax(tf.matmul(y1, W2) + b2)

#output
y = y2
y_ = tf.placeholder(tf.float32, [None, 10])

Upvotes: 10

Related Questions