Two layer neural network performs worse than single layer

Question

I'm learning TensorFlow, and trying to create a simple two layer neural network.

The tutorial code https://www.tensorflow.org/get_started/mnist/pros starts with this simple network, to get 92% accuracy:

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)

I tried replacing it with this very simple network, adding a new layer, but accuracy now drops to 84%!!!

layer1_len = 10
w1 = weight_var([784, layer1_len])
b1 = bias_var([layer1_len])
o1 = tf.nn.relu(tf.matmul(x, w1) + b1)
w2 = weight_var([layer1_len, 10])
b2 = bias_var([10])
y = tf.nn.softmax(tf.matmul(o1, w2) + b2)

I get that result with several different values for layer1_len as well as different numbers of training steps. (Note that if I omit the weight_var and bias_var random initialization, and keep everything at zero, accuracy drops to close to 10%, essentially no better than guessing.)

What am I doing wrong?

Salvador Dali · Accepted Answer

There is nothing wrong. The problem is that increasing layers does not automatically means a higher accuracy (otherwise machine learning would be kind of solved, because if you need a better accuracy in an image classifier you would just add +1 layer to an inception and claim a victory).

To show you that this is not only your problem - take a look at this high-level paper: Deep Residual Learning for Image Recognition where they see that increasing the number of layers decreases the scoring function (which is not important) and their architecture to overcome this problem (which is important). Here is a small part from it:

The deeper network has higher training error and thus test error.

Two layer neural network performs worse than single layer

Answers (1)

Related Questions