Reputation: 381
I was making a computer to predict a hand-written number from MNist data set using softmax function. and something weird happened. the cost was decreasing over time and eventually becomes something around 0.0038....(I used softmax_crossentropy_with_logits() for the cost function) However, the accuracy was pretty as low as 33%. So I thought "well.. I don't know what happened there but if I do softmax and crossentropy separately maybe it will produce different result !" and boom ! accuracy went up to 89 %. I have no idea why doing softmax and crossentropy separately makes such different result. I even looked up here :difference between tensorflow tf.nn.softmax and tf.nn.softmax_cross_entropy_with_logits
so this is the code that I used softmax_cross_entropy_with_logits() for the cost function (accuracy: 33%)
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
X = tf.placeholder(shape=[None,784],dtype=tf.float32)
Y = tf.placeholder(shape=[None,10],dtype=tf.float32)
W1= tf.Variable(tf.random_normal([784,20]))
b1= tf.Variable(tf.random_normal([20]))
layer1 = tf.nn.softmax(tf.matmul(X,W1)+b1)
W2 = tf.Variable(tf.random_normal([20,10]))
b2 = tf.Variable(tf.random_normal([10]))
logits = tf.matmul(layer1,W2)+b2
hypothesis = tf.nn.softmax(logits) # just so I can figure our the accuracy
cost_i= tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=Y)
cost = tf.reduce_mean(cost_i)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost)
batch_size = 100
train_epoch = 25
display_step = 1
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
for epoch in range(train_epoch):
av_cost = 0
total_batch = int(mnist.train.num_examples / batch_size)
for batch in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
sess.run(optimizer,feed_dict={X:batch_xs,Y:batch_ys})
av_cost += sess.run(cost,feed_dict={X:batch_xs,Y:batch_ys})/total_batch
if epoch % display_step == 0: # Softmax
print ("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(av_cost))
print ("Optimization Finished!")
correct_prediction = tf.equal(tf.argmax(hypothesis,1),tf.argmax(Y,1))
accuray = tf.reduce_mean(tf.cast(correct_prediction,'float32'))
print("Accuracy:",sess.run(accuray,feed_dict={X:mnist.test.images,Y:mnist.test.labels}))
and this is the one that I did softmax and cross_entropy separately(accuracy: 89%)
import tensorflow as tf #89 % accuracy one
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
X = tf.placeholder(shape=[None,784],dtype=tf.float32)
Y = tf.placeholder(shape=[None,10],dtype=tf.float32)
W1= tf.Variable(tf.random_normal([784,20]))
b1= tf.Variable(tf.random_normal([20]))
layer1 = tf.nn.softmax(tf.matmul(X,W1)+b1)
W2 = tf.Variable(tf.random_normal([20,10]))
b2 = tf.Variable(tf.random_normal([10]))
#logits = tf.matmul(layer1,W2)+b2
#cost_i= tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=Y)
logits = tf.matmul(layer1,W2)+b2
hypothesis = tf.nn.softmax(logits)
cost = tf.reduce_mean(tf.reduce_sum(-Y*tf.log(hypothesis)))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost)
batch_size = 100
train_epoch = 25
display_step = 1
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
for epoch in range(train_epoch):
av_cost = 0
total_batch = int(mnist.train.num_examples / batch_size)
for batch in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
sess.run(optimizer,feed_dict={X:batch_xs,Y:batch_ys})
av_cost += sess.run(cost,feed_dict={X:batch_xs,Y:batch_ys})/total_batch
if epoch % display_step == 0: # Softmax
print ("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(av_cost))
print ("Optimization Finished!")
correct_prediction = tf.equal(tf.argmax(hypothesis,1),tf.argmax(Y,1))
accuray = tf.reduce_mean(tf.cast(correct_prediction,'float32'))
print("Accuracy:",sess.run(accuray,feed_dict={X:mnist.test.images,Y:mnist.test.labels}))
Upvotes: 2
Views: 528
Reputation: 974
From Tensorflow API here the second way, cost = tf.reduce_mean(tf.reduce_sum(-Y*tf.log(hypothesis)))
is numerically unstable, and because of that you can't get same results,
Whatever, you can find on my GitHub the implementation of numerically stable cross entropy loss function which has the same result as tf.nn.softmax_cross_entropy_with_logits()
function.
You can see that tf.nn.softmax_cross_entropy_with_logits()
doesn't calculate the large numbers softmax normalization, on only approximate them, more details are in README section.
Upvotes: 2
Reputation: 2629
If you use tf.reduce_sum()
in the upper example, as you did in the lower one, you should be able to achieve similar results with both methods: cost = tf.reduce_mean(tf.reduce_sum( tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y)))
.
I increased the number of training epochs to 50 and achieved accuracies of 93.06% (tf.nn.softmax_cross_entropy_with_logits()
) and 93.24% (softmax and cross entropy separately), so the results are quite similar.
Upvotes: 2