Davis Yoshida
Davis Yoshida

Reputation: 1785

NaN from sparse_softmax_cross_entropy_with_logits in Tensorflow

I am getting NaN when I attempt to use the sparse_softmax_cross_entropy_with_logits loss function in tensorflow. I have a simple network, something like:

layer = tf.nn.relu(tf.matmul(inputs, W1) + b1)
layer = tf.nn.relu(tf.matmul(layer, W2) + b2)
logits = tf.matmul(inputs, W3) + b3
loss = tf.sparse_softmax_cross_entropy_with_logits(logits, labels)

I have many classes (~10000), so I imagine I am getting NaN because the logit corresponding to correct class in at least one of my examples got truncated to zero. Is there a way to avoid this?

Upvotes: 8

Views: 10725

Answers (3)

Davis Yoshida
Davis Yoshida

Reputation: 1785

It actually turns out that some of my labels were out of range (e.g. a label of 14000, when my logits matrix is just 150 x 10000). It turns out this results in a NaN rather than an error.

Upvotes: 12

Guillem Cucurull
Guillem Cucurull

Reputation: 1691

The NaN error probably occurs when one of the softmaxed logits gets truncated to 0, as you have said, and then it performs log(0) to compute the cross-entropy error.

To avoid this, as it is suggested in this other answer, you could clip the values of the softmax output so that they are never zero.

out = tf.clip_by_value(out,1e-10,100.0)

Or you could add a small constant to avoid having zeros:

out = out + 1e-10

The problem with it is that the softmax function is applied on the logits internally by sparse_softmax_cross_entropy_with_logits() so you can not change its behavior.

To overcome this, code the cross entropy error yourself and add the constant 1e-10 to the output of the softmax, not to the logits.

loss = -tf.reduce_sum(labels*tf.log(tf.nn.softmax(logits) + 1e-10))

Be aware that with the sparse_softmax_cross_entropy_with_logits() function the variable labels was the numeric value of the label, but if you implement the cross-entropy loss yourself, labels have to be the one-hot encoding of these numeric labels.

Update: I have corrected the answer thanks to the comment by @mdaoust. As he said the zeros are only relevant after the softmax function has been applied to the logits, not before.

Upvotes: 1

nessuno
nessuno

Reputation: 27042

tf.sparse_softmax_cross_entropy_with_logits handles the case of log(0) for you, you don't have to worry about it.

Usually a NaN is due to a high learning rate of your optimization algorithm. Try to lower it until NaN errors disappear and the loss starts to decrease

Upvotes: 5

Related Questions