Amir Dadon
Amir Dadon

Reputation: 11

nan on loss function tensorflow

I try to build a model that will identify by data and try to see the LOSS function loss =tf.reduce_mean(-(y_ * tf.log(y)+(1- y_)* tf.log (1-y))) But as of now I only get NAN on the prediction and printing NAN in the LOSS function

np_labels = np.array(labels)
np_labels = np_labels.reshape([np_labels.shape[0], 1])
features = 910
hidden_layer_nodes = 100
x = tf.placeholder(tf.float32, [None, features])
y_ = tf.placeholder(tf.float32, [None, 1])
W1 = tf.Variable(tf.truncated_normal([features,hidden_layer_nodes], stddev=0.1))
b1 = tf.Variable(tf.constant(0.1, shape=[hidden_layer_nodes]))
z1 = tf.add(tf.matmul(x,W1),b1)
a1 = tf.nn.relu(z1)
W2 = tf.Variable(tf.truncated_normal([hidden_layer_nodes,1], stddev=0.1))
b2 = tf.Variable(0.)
z2 = tf.matmul(a1,W2) + b2
y = 1 / (1.0 + tf.exp(-z2))

loss =tf.reduce_mean(-(y_ * tf.log(y)+(1- y_)* tf.log (1-y)))

update = tf.train.AdamOptimizer(0.01).minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(0,50):
        sess.run(update, feed_dict = {x:fvecs, y_:np_labels})
        print(sess.run(loss, feed_dict={x: fvecs, y_: np_labels}))

      #  sess.run(update, feed_dict = {x:data_x, y_:data_y})
     #   print(sess.run(loss, feed_dict={x: data_x, y_: data_y}))

print('prediction: ', y.eval(session=sess, feed_dict =  {x:[[493.9, 702.6, .....

i want to print the loss

Thanks

Upvotes: 0

Views: 312

Answers (1)

Patwie
Patwie

Reputation: 4450

This is not a TensorFlow-Issue. This results from the very bad idea of implementing the loss-function yourself.

import tensorflow as tf

z2 = tf.random_normal([8, 10]) * 20
y_ = tf.random_uniform([8, 1], minval=0, maxval=10, dtype=tf.float32)

y = 1 / (1.0 + tf.exp(-z2))

loss = tf.reduce_mean(-(y_ * tf.log(y)+(1- y_)* tf.log (1-y)))

with tf.Session() as sess:
    print sess.run(loss)  # will always fail with high prob

Will give Inf just because of missing the log-sum-exp trick which then causes your implementation to fail due to numerical instabilities (a folklore example which produces overflows). Just run this code several times and you get either NaN or Inf.

Solution would be:

  1. replace y = tf.sigmoid(-z2) by y = tf.identity(z2) to just get the untransformed logits
  2. replace loss = .. by loss = tf.nn.sigmoid_cross_entropy_with_logits(...) to use the numerical stable way

See the docs of sigmoid_cross_entropy_with_logits which explicitly describes this issue.

Upvotes: 1

Related Questions