Reputation: 165
I have trained a linear classifier on the MNIST dataset with 92% accuracy. Then I fixed the weights and optimized the input image such that softmax probability for 8 was maximized. But the softmax loss doesn't decrease below 2.302 (-log(1/10)) which means that my training has been useless. What am I doing wrong?
Code for training the weights:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels,
mnist.test.images, mnist.test.labels
X = tf.placeholder("float", [None, 784])
Y = tf.placeholder("float", [None, 10])
w = tf.Variable(tf.random_normal([784, 10], stddev=0.01))
b = tf.Variable(tf.zeros([10]))
o = tf.nn.sigmoid(tf.matmul(X, w)+b)
cost= tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=o, labels=Y))
train_op = tf.train.RMSPropOptimizer(0.001, 0.9).minimize(cost)
predict_op = tf.argmax(o, 1)
sess=tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(100):
for start, end in zip(range(0, len(trX), 256), range(256, len(trX)+1, 256)):
sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})
print(i, np.mean(np.argmax(teY, axis=1) == sess.run(predict_op, feed_dict={X: teX})))
Code for training the image for fixed weights:
#Copy trained weights into W,B and pass them as placeholders to new model
W=sess.run(w)
B=sess.run(b)
X=tf.Variable(tf.random_normal([1, 784], stddev=0.01))
Y=tf.constant([0, 0, 0, 0, 0, 0, 0, 0, 1, 0])
w=tf.placeholder("float")
b=tf.placeholder("float")
o = tf.nn.sigmoid(tf.matmul(X, w)+b)
cost= tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=o, labels=Y))
train_op = tf.train.RMSPropOptimizer(0.001, 0.9).minimize(cost)
predict_op = tf.argmax(o, 1)
sess.run(tf.global_variables_initializer())
for i in range(1000):
sess.run(train_op, feed_dict={w:W, b:B})
if i%50==0:
sess.run(cost, feed_dict={w:W, b:B})
print(i, sess.run(predict_op, feed_dict={w:W, b:B}))
Upvotes: 1
Views: 655
Reputation: 24591
You shouldn't call tf.sigmoid
on the output of your net. softmax_cross_entropy_with_logits
assumes your inputs are logits, i.e. unconstrained real numbers. Using
o = tf.matmul(X, w)+b
increases your accuracy to 92.8%.
With this modification, your second training works. The cost reaches 0 although the resulting image is anything but appealing.
Upvotes: 1