Ziemo
Ziemo

Reputation: 991

TensorFlow - predicting next word - loss function logit na target shape

I'm trying to create a language model. I have logit and target of size: [32, 312, 512]

Where:

The question is - when I pass logit and target to the loss function as follows:

self.loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(
                                          logits=self.logit, labels=self.y))

Does it compute appropriate loss for the current batch? Or should I reshape logit and target to express the following shape: [32, 312*512]?

Thanks in advance for your help!

Upvotes: 0

Views: 122

Answers (2)

Ziemo
Ziemo

Reputation: 991

The answer is: it's irrelevant, since tf.nn.softmax_cross_entropy_with_logits() have dim argument:

dim: The class dimension. Defaulted to -1 which is the last dimension.
name: A name for the operation (optional).

Also inside tf.nn.softmax_cross_entropy_with_logits() you have this code:

# Make precise_logits and labels into matrices.
precise_logits = _flatten_outer_dims(precise_logits)
labels = _flatten_outer_dims(labels)

Upvotes: 0

amin__
amin__

Reputation: 1058

The api documentation says about labels,

labels: Each row labels[i] must be a valid probability distribution

If you are predicting each character at a time, you would have a probability distribution (probability of being each character sum up to 1) over your vocab size 512. Given that, your labels and unscaled logits of shape [32, 312, 512], you should reshape it into [32*312, 512] before calling the function. In this way each row of your labels have a valid probability distribution and your unscaled logits will be converted to prob distribution by the function itself and then loss will be calculated.

Upvotes: 1

Related Questions