Reputation: 133
I have been starting with tensorflow and have been following this standard MNIST tutorial.
However, in contrast to the expected 92% accuracy, the accuracy obtained over the training set as well as the test set is not going beyond 67%. I am familiar with softmax and multinomial regression and have obtained more than 94% using scratch python implementation as well as using sklearn.linear_model.LogisticRegression.
I had tried the same using CIFAR-10 dataset and in that case the accuracy was too low and just about 10% which is equal to randomly assigning classes. This has made me doubt my installation of tensorflow, yet I am unsure about this.
Here is my implementation of Tensorflow MNIST tutorial. I would request if someone could have a look at my implementation.
Upvotes: 3
Views: 1078
Reputation: 707
Not sure if this is still relevant in June 2018, but the MNIST beginner tutorial no longer matches the example code on Github. If you download and run the example code, it does indeed give you the suggested 92% accuracy.
I noticed two things going wrong when following the tutorial:
1) Accidentally calling softmax twice
The tutorial first tells you to define y as follows:
y = tf.nn.softmax(tf.matmul(x, W) + b)
But later suggests that you define cross-entropy using tf.nn.softmax_cross_entropy_with_logits
, which would make it easy to accidentally do the following:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)
This would send your logits (tf.matmul(x, W) + b
) through softmax twice, which resulted in me getting stuck at a 67% accuracy.
However I noticed that even fixing this still only brought me up to a very unstable 80-90% accuracy, which leads me to the next issue:
2) tf.nn.softmax_cross_entropy_with_logits() is deprecated
They haven't updated the tutorial yet, but the tf.nn.softmax_cross_entropy_with_logits page indicates that this function has been deprecated.
In the example code on Github they've replaced it with tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y)
.
However you can't just swap the function out - the example code also changes the dimensionality on many of the other lines.
My suggestion to anyone doing this for the first time would be to download the current working example code from Github and try to match it up to the tutorial concepts without taking the instructions literally. Hopefully they will get around to updating it!
Upvotes: 1
Reputation: 2732
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
The initialization of W
may cause your network does not learn anything but random guessing. Because the grad will be zero and the backprop actually doesn't work at all.
You'd better to init the W
using tf.Variable(tf.truncated_normal([784, 10], mean=0.0, stddev=0.01))
see https://www.tensorflow.org/api_docs/python/tf/truncated_normal for more.
Upvotes: 0
Reputation: 222461
You constructed your graph, specified the loss function, and created the optimizer (which is correct). The problem is that you use your optimizer only once:
sess_tf.run(train_step, feed_dict={x: train_images_reshaped[0:1000], y_: train_labels[0:1000]})
So basically you run your gradient descent only once. Clearly you can't converge fast after only one tiny step in the right direction. You need to do something along the lines:
for _ in xrange(many_steps):
X, Y = get_a_new_batch_from(mnist_data)
sess_tf.run(train_step, feed_dict={x: X, y_: Y})
If you will not be able to figure out how to modify my pseudo-code, consult the tutorial, because based on my memory they covered this nicely.
Upvotes: 4