Why is TensorFlow predicting all 0's or all 1's after training?

Question

So my problem is that I am running through the beginner level code in the TensorFlow tutorial and have modified it for my needs but when I make it print sess.run(accuracy, feed_dict={x: x_test, y_: y_test}) it used to always print out a 1.0, now it's always guessing 0's and printing out an ~93% accuracy. When I use tf.argmin(y,1), tf.argmin(y_,1), it guesses all 1's and produces an ~7% accuracy rate. Add up the two and it equals 100%. I don't get how tf.argmin guesses 1's and tf.argmax guesses 0's. Obviously something is wrong with the code. Please have a look and let me know what I can do to fix this issue. I think that the code is going wrong during the training, but I could be wrong.

import tensorflow as tf
import numpy as np
from numpy import genfromtxt

data = genfromtxt('cs-training.csv',delimiter=',')  # Training data
test_data = genfromtxt('cs-test.csv',delimiter=',')  # Test data

x_train = []
for i in data:
    x_train.append(i[1:])
x_train = np.array(x_train)

y_train = []
for i in data:
    if i[0] == 0:
        y_train.append([1., i[0]])
    else:
        y_train.append([0., i[0]])
y_train = np.array(y_train)

where_are_NaNs = isnan(x_train)
x_train[where_are_NaNs] = 0

x_test = []
for i in test_data:
    x_test.append(i[1:])
x_test = np.array(x_test)

y_test = []
for i in test_data:
    if i[0] == 0:
        y_test.append([1., i[0]])
    else:
        y_test.append([0., i[0]])
y_test = np.array(y_test)

where_are_NaNs = isnan(x_test)
x_test[where_are_NaNs] = 0

x = tf.placeholder("float", [None, 10])
W = tf.Variable(tf.zeros([10,2]))
b = tf.Variable(tf.zeros([2]))

y = tf.nn.softmax(tf.matmul(x,W) + b)

y_ = tf.placeholder("float", [None,2])

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)

print "...Training..."

g = 0
for i in range(len(x_train)):

    sess.run(train_step, feed_dict={x: [x_train[g]], y_: [y_train[g]]})

    g += 1

At this point, if I make it print [x_train[g]] and print [y_train[g]], this is what the results look like.

[array([  7.66126609e-01,   4.50000000e+01,   2.00000000e+00,
     8.02982129e-01,   9.12000000e+03,   1.30000000e+01,
     0.00000000e+00,   6.00000000e+00,   0.00000000e+00,
     2.00000000e+00])]

[array([ 0.,  1.])]

Ok, let's carry on then.

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

print sess.run(accuracy, feed_dict={x: x_test, y_: y_test})
0.929209

This percentage does not shift. It's guessing all zeros regardless of the onehot that I created for the 2 classes (1 or 0).

Here's a look at the data-

print x_train[:10]

[[  7.66126609e-01   4.50000000e+01   2.00000000e+00   8.02982129e-01
9.12000000e+03   1.30000000e+01   0.00000000e+00   6.00000000e+00
0.00000000e+00   2.00000000e+00]
 [  9.57151019e-01   4.00000000e+01   0.00000000e+00   1.21876201e-01
2.60000000e+03   4.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   1.00000000e+00]
 [  6.58180140e-01   3.80000000e+01   1.00000000e+00   8.51133750e-02
3.04200000e+03   2.00000000e+00   1.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  2.33809776e-01   3.00000000e+01   0.00000000e+00   3.60496820e-02
3.30000000e+03   5.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  9.07239400e-01   4.90000000e+01   1.00000000e+00   2.49256950e-02
6.35880000e+04   7.00000000e+00   0.00000000e+00   1.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  2.13178682e-01   7.40000000e+01   0.00000000e+00   3.75606969e-01
3.50000000e+03   3.00000000e+00   0.00000000e+00   1.00000000e+00
0.00000000e+00   1.00000000e+00]
 [  3.05682465e-01   5.70000000e+01   0.00000000e+00   5.71000000e+03
0.00000000e+00   8.00000000e+00   0.00000000e+00   3.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  7.54463648e-01   3.90000000e+01   0.00000000e+00   2.09940017e-01
3.50000000e+03   8.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  1.16950644e-01   2.70000000e+01   0.00000000e+00   4.60000000e+01
0.00000000e+00   2.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  1.89169052e-01   5.70000000e+01   0.00000000e+00   6.06290901e-01
2.36840000e+04   9.00000000e+00   0.00000000e+00   4.00000000e+00
0.00000000e+00   2.00000000e+00]]

print y_train[:10]

[[ 0.  1.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]]

print x_test[:20]

[[  4.83539240e-02   4.40000000e+01   0.00000000e+00   3.02297622e-01
7.48500000e+03   1.10000000e+01   0.00000000e+00   1.00000000e+00
0.00000000e+00   2.00000000e+00]
 [  9.10224439e-01   4.20000000e+01   5.00000000e+00   1.72900000e+03
0.00000000e+00   5.00000000e+00   2.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  2.92682927e-01   5.80000000e+01   0.00000000e+00   3.66480079e-01
3.03600000e+03   7.00000000e+00   0.00000000e+00   1.00000000e+00
0.00000000e+00   1.00000000e+00]
 [  3.11547538e-01   3.30000000e+01   1.00000000e+00   3.55431993e-01
4.67500000e+03   1.10000000e+01   0.00000000e+00   1.00000000e+00
0.00000000e+00   1.00000000e+00]
 [  0.00000000e+00   7.20000000e+01   0.00000000e+00   2.16630600e-03
6.00000000e+03   9.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  2.79217052e-01   4.50000000e+01   1.00000000e+00   4.89921122e-01
6.84500000e+03   8.00000000e+00   0.00000000e+00   2.00000000e+00
0.00000000e+00   2.00000000e+00]
 [  0.00000000e+00   7.80000000e+01   0.00000000e+00   0.00000000e+00
0.00000000e+00   1.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  9.10363487e-01   2.80000000e+01   0.00000000e+00   4.99451497e-01
6.38000000e+03   8.00000000e+00   0.00000000e+00   2.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  6.36595797e-01   4.40000000e+01   0.00000000e+00   7.85457163e-01
4.16600000e+03   6.00000000e+00   0.00000000e+00   1.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  1.41549211e-01   2.60000000e+01   0.00000000e+00   2.68407434e-01
4.25000000e+03   4.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  4.14101100e-03   7.80000000e+01   0.00000000e+00   2.26362500e-03
5.74200000e+03   7.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  9.99999900e-01   6.00000000e+01   0.00000000e+00   1.20000000e+02
0.00000000e+00   2.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  6.28525944e-01   4.70000000e+01   0.00000000e+00   1.13100000e+03
0.00000000e+00   5.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   2.00000000e+00]
 [  4.02283095e-01   6.00000000e+01   0.00000000e+00   3.79442065e-01
8.63800000e+03   1.00000000e+01   0.00000000e+00   1.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  5.70997900e-03   8.10000000e+01   0.00000000e+00   2.17382000e-04
2.30000000e+04   4.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  4.71171849e-01   5.10000000e+01   0.00000000e+00   1.53700000e+03
0.00000000e+00   1.40000000e+01   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  1.42395210e-02   8.20000000e+01   0.00000000e+00   7.40466500e-03
2.70000000e+03   1.00000000e+01   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  4.67455800e-02   3.70000000e+01   0.00000000e+00   1.48010090e-02
9.12000000e+03   8.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   4.00000000e+00]
 [  9.99999900e-01   4.70000000e+01   0.00000000e+00   3.54604127e-01
1.10000000e+04   1.10000000e+01   0.00000000e+00   2.00000000e+00
0.00000000e+00   3.00000000e+00]
 [  8.96417860e-02   2.70000000e+01   0.00000000e+00   8.14664000e-03
5.40000000e+03   6.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]]

print y_test[:20]
[[ 1.  0.]
 [ 0.  1.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 0.  1.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]]

dga · Accepted Answer

tl;dr: The way the sample code above posted computes the cross-entropy is not numerically robust. Use tf.nn.cross_entropy_with_logits instead.

(in response to v1 of the question, which has changed): I'm worried that your training is not actually getting run to completion or working, based upon the nans in your x_train data that you showed. I'd suggest first fixing that - and identifying why they showed up and fixing that bug, and seeing if you also have nans in your test set. Might be helpful to show x_test and y_test also.

Finally, I believe there's a bug in the way y_ is handled in relation to x. The code is written as if y_ is a one-hot matrix, but when you show y_train[:10], it only has 10 elements, not 10*num_classes categories. I suspect a bug there. When you argmax it on axis 1, you're always going to get a vector full of zeros (because there's only one element on that axis, so of course it's the maximum element). Combine that with a bug producing always-zero output on the estimate, and you're always producing a "correct" answer. :)

Update for revised version In the changed version, if you run it and print out W at the end of every execution by changing your code to look like this:

 _, w_out, b_out = sess.run([train_step, W, b], feed_dict={x: [x_train[g]], y_: [y_train[g]]})

you'll observe that W is full of nans. To debug this, you can either stare a lot at your code to see if there's a mathematical problem you can spot, or you can instrument back through the pipeline to see where they show up. Let's try that. First, what's the cross_entropy? (add cross_entropy to the list of things in the run statement and print it out)

Cross entropy:  inf

Great! So.. why? Well, one answer is that when:

y = [0, 1]
tf.log(y) = [-inf, 0]

This is a valid possible output for y, but one that your computation of the cross-entropy is not robust to. You could either manually add some epsilons to avoid the corner cases, or use tf.nn.softmax_cross_entropy_with_logits to do it for you. I recommend the latter:

yprime = tf.matmul(x,W)+b
y = tf.nn.softmax(yprime)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(yprime, y_)

I don't guarantee that your model will work, but this should fix your current NaN problem.

Why is TensorFlow predicting all 0's or all 1's after training?

Answers (1)

Related Questions

Why is TensorFlow predicting all 0&#39;s or all 1&#39;s after training?

Answers (1)

Related Questions

Why is TensorFlow predicting all 0's or all 1's after training?