Reputation: 991
I'm trying to create a language model. I have logit
and target of size: [32, 312, 512]
Where:
.shape[0]
is batch_size
.shape[1]
is sequence_max_len
.shape[2]
is vocabulary size
The question is - when I pass logit
and target
to the loss function as follows:
self.loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits=self.logit, labels=self.y))
Does it compute appropriate loss for the current batch? Or should I reshape logit
and target
to express the following shape: [32, 312*512]
?
Thanks in advance for your help!
Upvotes: 0
Views: 122
Reputation: 991
The answer is: it's irrelevant, since tf.nn.softmax_cross_entropy_with_logits()
have dim
argument:
dim: The class dimension. Defaulted to -1 which is the last dimension.
name: A name for the operation (optional).
Also inside tf.nn.softmax_cross_entropy_with_logits()
you have this code:
# Make precise_logits and labels into matrices.
precise_logits = _flatten_outer_dims(precise_logits)
labels = _flatten_outer_dims(labels)
Upvotes: 0
Reputation: 1058
The api documentation says about labels,
labels: Each row labels[i] must be a valid probability distribution
If you are predicting each character at a time, you would have a probability distribution (probability of being each character sum up to 1) over your vocab size 512. Given that, your labels and unscaled logits of shape [32, 312, 512], you should reshape it into [32*312, 512] before calling the function. In this way each row of your labels have a valid probability distribution and your unscaled logits will be converted to prob distribution by the function itself and then loss will be calculated.
Upvotes: 1