Reputation: 1785
I am getting NaN when I attempt to use the sparse_softmax_cross_entropy_with_logits loss function in tensorflow. I have a simple network, something like:
layer = tf.nn.relu(tf.matmul(inputs, W1) + b1)
layer = tf.nn.relu(tf.matmul(layer, W2) + b2)
logits = tf.matmul(inputs, W3) + b3
loss = tf.sparse_softmax_cross_entropy_with_logits(logits, labels)
I have many classes (~10000), so I imagine I am getting NaN because the logit corresponding to correct class in at least one of my examples got truncated to zero. Is there a way to avoid this?
Upvotes: 8
Views: 10725
Reputation: 1785
It actually turns out that some of my labels were out of range (e.g. a label of 14000, when my logits matrix is just 150 x 10000). It turns out this results in a NaN rather than an error.
Upvotes: 12
Reputation: 1691
The NaN
error probably occurs when one of the softmaxed logits gets truncated to 0, as you have said, and then it performs log(0) to compute the cross-entropy error.
To avoid this, as it is suggested in this other answer, you could clip the values of the softmax output so that they are never zero.
out = tf.clip_by_value(out,1e-10,100.0)
Or you could add a small constant to avoid having zeros:
out = out + 1e-10
The problem with it is that the softmax function is applied on the logits internally by sparse_softmax_cross_entropy_with_logits()
so you can not change its behavior.
To overcome this, code the cross entropy error yourself and add the constant 1e-10
to the output of the softmax, not to the logits.
loss = -tf.reduce_sum(labels*tf.log(tf.nn.softmax(logits) + 1e-10))
Be aware that with the sparse_softmax_cross_entropy_with_logits()
function the variable labels
was the numeric value of the label, but if you implement the cross-entropy loss yourself, labels
have to be the one-hot encoding of these numeric labels.
Update: I have corrected the answer thanks to the comment by @mdaoust. As he said the zeros are only relevant after the softmax function has been applied to the logits, not before.
Upvotes: 1
Reputation: 27042
tf.sparse_softmax_cross_entropy_with_logits
handles the case of log(0)
for you, you don't have to worry about it.
Usually a NaN
is due to a high learning rate of your optimization algorithm. Try to lower it until NaN
errors disappear and the loss starts to decrease
Upvotes: 5