user2814010
user2814010

Reputation: 29

Understanding output of softmax_cross_entropy_with_logits

Am new to tensorflow, can someone explain me how did we get answer as 1.16012561.

 unscaled_logits = tf.constant([[1., -3., 10.]])
   target_dist = tf.constant([[0.1, 0.02, 0.88]])
   softmax_xentropy = 
   tf.nn.softmax_cross_entropy_with_logits(logits=unscaled_logits, 
   labels=target_dist)
   with tf.Session() as sess:
       print(sess.run(softmax_xentropy))

Output: [ 1.16012561]

Upvotes: 2

Views: 2609

Answers (1)

javidcf
javidcf

Reputation: 59681

Here is a good explanation about it. It works like this. First, the logits are passed through the softmax function, giving you a probability distribution:

import numpy as np

logits = np.array([1., -3., 10.])
# Softmax function
softmax = np.exp(logits) / np.sum(np.exp(logits))
print(softmax)
>>> array([  1.23394297e-04,   2.26004539e-06,   9.99874346e-01])
# It is a probability distribution because the values are in [0, 1]
# and add up to 1
np.sum(softmax)
>>> 0.99999999999999989  # Almost, that is

Then, you compute the cross-entropy between the computed softmax value and the target.

target = np.array([0.1, 0.02, 0.88])
# Cross-entropy function
crossentropy = -np.sum(target * np.log(softmax))
print(crossentropy)
>>> 1.1601256622376641

tf.nn.softmax_cross_entropy_with_logits will return you one of these values "per vector" (by default, "vectors" are in the last dimension), so, for example, if your input logits and targets have size 10x3 you will end up with 10 cross-entropy values. Usually one sums or averages these all and uses the result as loss value to minimize (which is what tf.losses.softmax_cross_entropy offers). The logic behind the cross-entropy expression is that target * np.log(softmax) will take negative values closer to zero where target is more similar to softmax and diverge from zero (towards minus infinity) when they are different.

Note: This is a logical explanation of the function. Internally, TensorFlow most likely perform different but equivalent operations for better performance and numerical stability.

Upvotes: 3

Related Questions