codeahead
codeahead

Reputation: 133

Tensorflow MNIST tutorial - Test Accuracy very low

I have been starting with tensorflow and have been following this standard MNIST tutorial.

However, in contrast to the expected 92% accuracy, the accuracy obtained over the training set as well as the test set is not going beyond 67%. I am familiar with softmax and multinomial regression and have obtained more than 94% using scratch python implementation as well as using sklearn.linear_model.LogisticRegression.

I had tried the same using CIFAR-10 dataset and in that case the accuracy was too low and just about 10% which is equal to randomly assigning classes. This has made me doubt my installation of tensorflow, yet I am unsure about this.

Here is my implementation of Tensorflow MNIST tutorial. I would request if someone could have a look at my implementation.

Upvotes: 3

Views: 1078

Answers (3)

Casey L
Casey L

Reputation: 707

Not sure if this is still relevant in June 2018, but the MNIST beginner tutorial no longer matches the example code on Github. If you download and run the example code, it does indeed give you the suggested 92% accuracy.

I noticed two things going wrong when following the tutorial:

1) Accidentally calling softmax twice

The tutorial first tells you to define y as follows:

y = tf.nn.softmax(tf.matmul(x, W) + b)

But later suggests that you define cross-entropy using tf.nn.softmax_cross_entropy_with_logits, which would make it easy to accidentally do the following:

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)

This would send your logits (tf.matmul(x, W) + b) through softmax twice, which resulted in me getting stuck at a 67% accuracy.

However I noticed that even fixing this still only brought me up to a very unstable 80-90% accuracy, which leads me to the next issue:

2) tf.nn.softmax_cross_entropy_with_logits() is deprecated

They haven't updated the tutorial yet, but the tf.nn.softmax_cross_entropy_with_logits page indicates that this function has been deprecated.

In the example code on Github they've replaced it with tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y).

However you can't just swap the function out - the example code also changes the dimensionality on many of the other lines.

My suggestion to anyone doing this for the first time would be to download the current working example code from Github and try to match it up to the tutorial concepts without taking the instructions literally. Hopefully they will get around to updating it!

Upvotes: 1

Qy Zuo
Qy Zuo

Reputation: 2732

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

The initialization of W may cause your network does not learn anything but random guessing. Because the grad will be zero and the backprop actually doesn't work at all.

You'd better to init the W using tf.Variable(tf.truncated_normal([784, 10], mean=0.0, stddev=0.01)) see https://www.tensorflow.org/api_docs/python/tf/truncated_normal for more.

Upvotes: 0

Salvador Dali
Salvador Dali

Reputation: 222461

You constructed your graph, specified the loss function, and created the optimizer (which is correct). The problem is that you use your optimizer only once:

sess_tf.run(train_step, feed_dict={x: train_images_reshaped[0:1000], y_: train_labels[0:1000]})

So basically you run your gradient descent only once. Clearly you can't converge fast after only one tiny step in the right direction. You need to do something along the lines:

for _ in xrange(many_steps):
  X, Y = get_a_new_batch_from(mnist_data)
  sess_tf.run(train_step, feed_dict={x: X, y_: Y})

If you will not be able to figure out how to modify my pseudo-code, consult the tutorial, because based on my memory they covered this nicely.

Upvotes: 4

Related Questions