Ravaal
Ravaal

Reputation: 3359

Why is TensorFlow predicting all 0's or all 1's after training?

So my problem is that I am running through the beginner level code in the TensorFlow tutorial and have modified it for my needs but when I make it print sess.run(accuracy, feed_dict={x: x_test, y_: y_test}) it used to always print out a 1.0, now it's always guessing 0's and printing out an ~93% accuracy. When I use tf.argmin(y,1), tf.argmin(y_,1), it guesses all 1's and produces an ~7% accuracy rate. Add up the two and it equals 100%. I don't get how tf.argmin guesses 1's and tf.argmax guesses 0's. Obviously something is wrong with the code. Please have a look and let me know what I can do to fix this issue. I think that the code is going wrong during the training, but I could be wrong.

import tensorflow as tf
import numpy as np
from numpy import genfromtxt

data = genfromtxt('cs-training.csv',delimiter=',')  # Training data
test_data = genfromtxt('cs-test.csv',delimiter=',')  # Test data

x_train = []
for i in data:
    x_train.append(i[1:])
x_train = np.array(x_train)

y_train = []
for i in data:
    if i[0] == 0:
        y_train.append([1., i[0]])
    else:
        y_train.append([0., i[0]])
y_train = np.array(y_train)

where_are_NaNs = isnan(x_train)
x_train[where_are_NaNs] = 0

x_test = []
for i in test_data:
    x_test.append(i[1:])
x_test = np.array(x_test)

y_test = []
for i in test_data:
    if i[0] == 0:
        y_test.append([1., i[0]])
    else:
        y_test.append([0., i[0]])
y_test = np.array(y_test)

where_are_NaNs = isnan(x_test)
x_test[where_are_NaNs] = 0

x = tf.placeholder("float", [None, 10])
W = tf.Variable(tf.zeros([10,2]))
b = tf.Variable(tf.zeros([2]))

y = tf.nn.softmax(tf.matmul(x,W) + b)

y_ = tf.placeholder("float", [None,2])

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)

print "...Training..."

g = 0
for i in range(len(x_train)):

    sess.run(train_step, feed_dict={x: [x_train[g]], y_: [y_train[g]]})

    g += 1

At this point, if I make it print [x_train[g]] and print [y_train[g]], this is what the results look like.

[array([  7.66126609e-01,   4.50000000e+01,   2.00000000e+00,
     8.02982129e-01,   9.12000000e+03,   1.30000000e+01,
     0.00000000e+00,   6.00000000e+00,   0.00000000e+00,
     2.00000000e+00])]

[array([ 0.,  1.])]

Ok, let's carry on then.

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

print sess.run(accuracy, feed_dict={x: x_test, y_: y_test})
0.929209

This percentage does not shift. It's guessing all zeros regardless of the onehot that I created for the 2 classes (1 or 0).

Here's a look at the data-

print x_train[:10]

[[  7.66126609e-01   4.50000000e+01   2.00000000e+00   8.02982129e-01
9.12000000e+03   1.30000000e+01   0.00000000e+00   6.00000000e+00
0.00000000e+00   2.00000000e+00]
 [  9.57151019e-01   4.00000000e+01   0.00000000e+00   1.21876201e-01
2.60000000e+03   4.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   1.00000000e+00]
 [  6.58180140e-01   3.80000000e+01   1.00000000e+00   8.51133750e-02
3.04200000e+03   2.00000000e+00   1.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  2.33809776e-01   3.00000000e+01   0.00000000e+00   3.60496820e-02
3.30000000e+03   5.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  9.07239400e-01   4.90000000e+01   1.00000000e+00   2.49256950e-02
6.35880000e+04   7.00000000e+00   0.00000000e+00   1.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  2.13178682e-01   7.40000000e+01   0.00000000e+00   3.75606969e-01
3.50000000e+03   3.00000000e+00   0.00000000e+00   1.00000000e+00
0.00000000e+00   1.00000000e+00]
 [  3.05682465e-01   5.70000000e+01   0.00000000e+00   5.71000000e+03
0.00000000e+00   8.00000000e+00   0.00000000e+00   3.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  7.54463648e-01   3.90000000e+01   0.00000000e+00   2.09940017e-01
3.50000000e+03   8.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  1.16950644e-01   2.70000000e+01   0.00000000e+00   4.60000000e+01
0.00000000e+00   2.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  1.89169052e-01   5.70000000e+01   0.00000000e+00   6.06290901e-01
2.36840000e+04   9.00000000e+00   0.00000000e+00   4.00000000e+00
0.00000000e+00   2.00000000e+00]]

print y_train[:10]

[[ 0.  1.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]]

print x_test[:20]

[[  4.83539240e-02   4.40000000e+01   0.00000000e+00   3.02297622e-01
7.48500000e+03   1.10000000e+01   0.00000000e+00   1.00000000e+00
0.00000000e+00   2.00000000e+00]
 [  9.10224439e-01   4.20000000e+01   5.00000000e+00   1.72900000e+03
0.00000000e+00   5.00000000e+00   2.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  2.92682927e-01   5.80000000e+01   0.00000000e+00   3.66480079e-01
3.03600000e+03   7.00000000e+00   0.00000000e+00   1.00000000e+00
0.00000000e+00   1.00000000e+00]
 [  3.11547538e-01   3.30000000e+01   1.00000000e+00   3.55431993e-01
4.67500000e+03   1.10000000e+01   0.00000000e+00   1.00000000e+00
0.00000000e+00   1.00000000e+00]
 [  0.00000000e+00   7.20000000e+01   0.00000000e+00   2.16630600e-03
6.00000000e+03   9.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  2.79217052e-01   4.50000000e+01   1.00000000e+00   4.89921122e-01
6.84500000e+03   8.00000000e+00   0.00000000e+00   2.00000000e+00
0.00000000e+00   2.00000000e+00]
 [  0.00000000e+00   7.80000000e+01   0.00000000e+00   0.00000000e+00
0.00000000e+00   1.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  9.10363487e-01   2.80000000e+01   0.00000000e+00   4.99451497e-01
6.38000000e+03   8.00000000e+00   0.00000000e+00   2.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  6.36595797e-01   4.40000000e+01   0.00000000e+00   7.85457163e-01
4.16600000e+03   6.00000000e+00   0.00000000e+00   1.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  1.41549211e-01   2.60000000e+01   0.00000000e+00   2.68407434e-01
4.25000000e+03   4.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  4.14101100e-03   7.80000000e+01   0.00000000e+00   2.26362500e-03
5.74200000e+03   7.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  9.99999900e-01   6.00000000e+01   0.00000000e+00   1.20000000e+02
0.00000000e+00   2.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  6.28525944e-01   4.70000000e+01   0.00000000e+00   1.13100000e+03
0.00000000e+00   5.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   2.00000000e+00]
 [  4.02283095e-01   6.00000000e+01   0.00000000e+00   3.79442065e-01
8.63800000e+03   1.00000000e+01   0.00000000e+00   1.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  5.70997900e-03   8.10000000e+01   0.00000000e+00   2.17382000e-04
2.30000000e+04   4.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  4.71171849e-01   5.10000000e+01   0.00000000e+00   1.53700000e+03
0.00000000e+00   1.40000000e+01   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  1.42395210e-02   8.20000000e+01   0.00000000e+00   7.40466500e-03
2.70000000e+03   1.00000000e+01   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  4.67455800e-02   3.70000000e+01   0.00000000e+00   1.48010090e-02
9.12000000e+03   8.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   4.00000000e+00]
 [  9.99999900e-01   4.70000000e+01   0.00000000e+00   3.54604127e-01
1.10000000e+04   1.10000000e+01   0.00000000e+00   2.00000000e+00
0.00000000e+00   3.00000000e+00]
 [  8.96417860e-02   2.70000000e+01   0.00000000e+00   8.14664000e-03
5.40000000e+03   6.00000000e+00   0.00000000e+00   0.00000000e+00
0.00000000e+00   0.00000000e+00]]

print y_test[:20]
[[ 1.  0.]
 [ 0.  1.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 0.  1.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]]

Upvotes: 3

Views: 4120

Answers (1)

dga
dga

Reputation: 21927

tl;dr: The way the sample code above posted computes the cross-entropy is not numerically robust. Use tf.nn.cross_entropy_with_logits instead.

(in response to v1 of the question, which has changed): I'm worried that your training is not actually getting run to completion or working, based upon the nans in your x_train data that you showed. I'd suggest first fixing that - and identifying why they showed up and fixing that bug, and seeing if you also have nans in your test set. Might be helpful to show x_test and y_test also.

Finally, I believe there's a bug in the way y_ is handled in relation to x. The code is written as if y_ is a one-hot matrix, but when you show y_train[:10], it only has 10 elements, not 10*num_classes categories. I suspect a bug there. When you argmax it on axis 1, you're always going to get a vector full of zeros (because there's only one element on that axis, so of course it's the maximum element). Combine that with a bug producing always-zero output on the estimate, and you're always producing a "correct" answer. :)

Update for revised version In the changed version, if you run it and print out W at the end of every execution by changing your code to look like this:

 _, w_out, b_out = sess.run([train_step, W, b], feed_dict={x: [x_train[g]], y_: [y_train[g]]})

you'll observe that W is full of nans. To debug this, you can either stare a lot at your code to see if there's a mathematical problem you can spot, or you can instrument back through the pipeline to see where they show up. Let's try that. First, what's the cross_entropy? (add cross_entropy to the list of things in the run statement and print it out)

Cross entropy:  inf

Great! So.. why? Well, one answer is that when:

y = [0, 1]
tf.log(y) = [-inf, 0]

This is a valid possible output for y, but one that your computation of the cross-entropy is not robust to. You could either manually add some epsilons to avoid the corner cases, or use tf.nn.softmax_cross_entropy_with_logits to do it for you. I recommend the latter:

yprime = tf.matmul(x,W)+b
y = tf.nn.softmax(yprime)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(yprime, y_)

I don't guarantee that your model will work, but this should fix your current NaN problem.

Upvotes: 5

Related Questions