Reputation: 3359
So my problem is that I am running through the beginner level code in the TensorFlow tutorial and have modified it for my needs but when I make it print sess.run(accuracy, feed_dict={x: x_test, y_: y_test})
it used to always print out a 1.0, now it's always guessing 0's and printing out an ~93% accuracy. When I use tf.argmin(y,1), tf.argmin(y_,1)
, it guesses all 1's and produces an ~7% accuracy rate. Add up the two and it equals 100%. I don't get how tf.argmin
guesses 1's and tf.argmax
guesses 0's. Obviously something is wrong with the code. Please have a look and let me know what I can do to fix this issue. I think that the code is going wrong during the training, but I could be wrong.
import tensorflow as tf
import numpy as np
from numpy import genfromtxt
data = genfromtxt('cs-training.csv',delimiter=',') # Training data
test_data = genfromtxt('cs-test.csv',delimiter=',') # Test data
x_train = []
for i in data:
x_train.append(i[1:])
x_train = np.array(x_train)
y_train = []
for i in data:
if i[0] == 0:
y_train.append([1., i[0]])
else:
y_train.append([0., i[0]])
y_train = np.array(y_train)
where_are_NaNs = isnan(x_train)
x_train[where_are_NaNs] = 0
x_test = []
for i in test_data:
x_test.append(i[1:])
x_test = np.array(x_test)
y_test = []
for i in test_data:
if i[0] == 0:
y_test.append([1., i[0]])
else:
y_test.append([0., i[0]])
y_test = np.array(y_test)
where_are_NaNs = isnan(x_test)
x_test[where_are_NaNs] = 0
x = tf.placeholder("float", [None, 10])
W = tf.Variable(tf.zeros([10,2]))
b = tf.Variable(tf.zeros([2]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,2])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
print "...Training..."
g = 0
for i in range(len(x_train)):
sess.run(train_step, feed_dict={x: [x_train[g]], y_: [y_train[g]]})
g += 1
At this point, if I make it print [x_train[g]]
and print [y_train[g]]
, this is what the results look like.
[array([ 7.66126609e-01, 4.50000000e+01, 2.00000000e+00,
8.02982129e-01, 9.12000000e+03, 1.30000000e+01,
0.00000000e+00, 6.00000000e+00, 0.00000000e+00,
2.00000000e+00])]
[array([ 0., 1.])]
Ok, let's carry on then.
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print sess.run(accuracy, feed_dict={x: x_test, y_: y_test})
0.929209
This percentage does not shift. It's guessing all zeros regardless of the onehot that I created for the 2 classes (1 or 0).
Here's a look at the data-
print x_train[:10]
[[ 7.66126609e-01 4.50000000e+01 2.00000000e+00 8.02982129e-01
9.12000000e+03 1.30000000e+01 0.00000000e+00 6.00000000e+00
0.00000000e+00 2.00000000e+00]
[ 9.57151019e-01 4.00000000e+01 0.00000000e+00 1.21876201e-01
2.60000000e+03 4.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 1.00000000e+00]
[ 6.58180140e-01 3.80000000e+01 1.00000000e+00 8.51133750e-02
3.04200000e+03 2.00000000e+00 1.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 2.33809776e-01 3.00000000e+01 0.00000000e+00 3.60496820e-02
3.30000000e+03 5.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 9.07239400e-01 4.90000000e+01 1.00000000e+00 2.49256950e-02
6.35880000e+04 7.00000000e+00 0.00000000e+00 1.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 2.13178682e-01 7.40000000e+01 0.00000000e+00 3.75606969e-01
3.50000000e+03 3.00000000e+00 0.00000000e+00 1.00000000e+00
0.00000000e+00 1.00000000e+00]
[ 3.05682465e-01 5.70000000e+01 0.00000000e+00 5.71000000e+03
0.00000000e+00 8.00000000e+00 0.00000000e+00 3.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 7.54463648e-01 3.90000000e+01 0.00000000e+00 2.09940017e-01
3.50000000e+03 8.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 1.16950644e-01 2.70000000e+01 0.00000000e+00 4.60000000e+01
0.00000000e+00 2.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 1.89169052e-01 5.70000000e+01 0.00000000e+00 6.06290901e-01
2.36840000e+04 9.00000000e+00 0.00000000e+00 4.00000000e+00
0.00000000e+00 2.00000000e+00]]
print y_train[:10]
[[ 0. 1.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]]
print x_test[:20]
[[ 4.83539240e-02 4.40000000e+01 0.00000000e+00 3.02297622e-01
7.48500000e+03 1.10000000e+01 0.00000000e+00 1.00000000e+00
0.00000000e+00 2.00000000e+00]
[ 9.10224439e-01 4.20000000e+01 5.00000000e+00 1.72900000e+03
0.00000000e+00 5.00000000e+00 2.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 2.92682927e-01 5.80000000e+01 0.00000000e+00 3.66480079e-01
3.03600000e+03 7.00000000e+00 0.00000000e+00 1.00000000e+00
0.00000000e+00 1.00000000e+00]
[ 3.11547538e-01 3.30000000e+01 1.00000000e+00 3.55431993e-01
4.67500000e+03 1.10000000e+01 0.00000000e+00 1.00000000e+00
0.00000000e+00 1.00000000e+00]
[ 0.00000000e+00 7.20000000e+01 0.00000000e+00 2.16630600e-03
6.00000000e+03 9.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 2.79217052e-01 4.50000000e+01 1.00000000e+00 4.89921122e-01
6.84500000e+03 8.00000000e+00 0.00000000e+00 2.00000000e+00
0.00000000e+00 2.00000000e+00]
[ 0.00000000e+00 7.80000000e+01 0.00000000e+00 0.00000000e+00
0.00000000e+00 1.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 9.10363487e-01 2.80000000e+01 0.00000000e+00 4.99451497e-01
6.38000000e+03 8.00000000e+00 0.00000000e+00 2.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 6.36595797e-01 4.40000000e+01 0.00000000e+00 7.85457163e-01
4.16600000e+03 6.00000000e+00 0.00000000e+00 1.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 1.41549211e-01 2.60000000e+01 0.00000000e+00 2.68407434e-01
4.25000000e+03 4.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 4.14101100e-03 7.80000000e+01 0.00000000e+00 2.26362500e-03
5.74200000e+03 7.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 9.99999900e-01 6.00000000e+01 0.00000000e+00 1.20000000e+02
0.00000000e+00 2.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 6.28525944e-01 4.70000000e+01 0.00000000e+00 1.13100000e+03
0.00000000e+00 5.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 2.00000000e+00]
[ 4.02283095e-01 6.00000000e+01 0.00000000e+00 3.79442065e-01
8.63800000e+03 1.00000000e+01 0.00000000e+00 1.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 5.70997900e-03 8.10000000e+01 0.00000000e+00 2.17382000e-04
2.30000000e+04 4.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 4.71171849e-01 5.10000000e+01 0.00000000e+00 1.53700000e+03
0.00000000e+00 1.40000000e+01 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 1.42395210e-02 8.20000000e+01 0.00000000e+00 7.40466500e-03
2.70000000e+03 1.00000000e+01 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 4.67455800e-02 3.70000000e+01 0.00000000e+00 1.48010090e-02
9.12000000e+03 8.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 4.00000000e+00]
[ 9.99999900e-01 4.70000000e+01 0.00000000e+00 3.54604127e-01
1.10000000e+04 1.10000000e+01 0.00000000e+00 2.00000000e+00
0.00000000e+00 3.00000000e+00]
[ 8.96417860e-02 2.70000000e+01 0.00000000e+00 8.14664000e-03
5.40000000e+03 6.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00]]
print y_test[:20]
[[ 1. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]]
Upvotes: 3
Views: 4120
Reputation: 21927
tl;dr: The way the sample code above posted computes the cross-entropy is not numerically robust. Use tf.nn.cross_entropy_with_logits
instead.
(in response to v1 of the question, which has changed): I'm worried that your training is not actually getting run to completion or working, based upon the nan
s in your x_train data that you showed. I'd suggest first fixing that - and identifying why they showed up and fixing that bug, and seeing if you also have nan
s in your test set. Might be helpful to show x_test and y_test also.
Finally, I believe there's a bug in the way y_
is handled in relation to x. The code is written as if y_
is a one-hot matrix, but when you show y_train[:10]
, it only has 10 elements, not 10*num_classes
categories. I suspect a bug there. When you argmax it on axis 1, you're always going to get a vector full of zeros (because there's only one element on that axis, so of course it's the maximum element). Combine that with a bug producing always-zero output on the estimate, and you're always producing a "correct" answer. :)
Update for revised version In the changed version, if you run it and print out W at the end of every execution by changing your code to look like this:
_, w_out, b_out = sess.run([train_step, W, b], feed_dict={x: [x_train[g]], y_: [y_train[g]]})
you'll observe that W is full of nan
s. To debug this, you can either stare a lot at your code to see if there's a mathematical problem you can spot, or you can instrument back through the pipeline to see where they show up. Let's try that. First, what's the cross_entropy
? (add cross_entropy
to the list of things in the run
statement and print it out)
Cross entropy: inf
Great! So.. why? Well, one answer is that when:
y = [0, 1]
tf.log(y) = [-inf, 0]
This is a valid possible output for y, but one that your computation of the cross-entropy is not robust to. You could either manually add some epsilons to avoid the corner cases, or use tf.nn.softmax_cross_entropy_with_logits
to do it for you. I recommend the latter:
yprime = tf.matmul(x,W)+b
y = tf.nn.softmax(yprime)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(yprime, y_)
I don't guarantee that your model will work, but this should fix your current NaN problem.
Upvotes: 5