Ricardo
Ricardo

Reputation: 65

SparseCategoricalCrossentropy Shape Mismatch

I want to do a simple test of the SparseCategoricalCrossentropy function, to see what exactly it does to an output. For that I use the output of the last layer of a MobileNetV2.

    import keras.backend as K

    full_model = tf.keras.applications.MobileNetV2(
    input_shape=(224,224,3),
    alpha=1.0,
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",)

    func = K.function(full_model.layers[1].input, full_model.layers[155].output)
    conv_output = func([processed_image])
    y_pred = np.single(conv_output)
    
    y_true = np.zeros(1000).reshape(1,1000)
    y_true[0][282] = 1
    
    scce = tf.keras.losses.SparseCategoricalCrossentropy()
    scce(y_true, y_pred).numpy()

processed_image is a 1x224x224x3 array created previously.

I'm getting the error ValueError: Shape mismatch: The shape of labels (received (1000,)) should equal the shape of logits except for the last dimension (received (1, 1000)).

I tried reshaping the arrays to match the dimensions the error mentioned, but it doesn't seem to work. What shapes does it accept?

Upvotes: 1

Views: 366

Answers (1)

AloneTogether
AloneTogether

Reputation: 26708

Since you are using the SparseCategoricalCrossentropy loss function, the shape of y_true should be [batch_size] and the shape of y_pred should be [batch_size, num_classes]. Furthermore, y_true should consist of integer values. See the documentation. In your concrete example, you could try something like this:

import keras.backend as K
import tensorflow as tf
import numpy as np

full_model = tf.keras.applications.MobileNetV2(
             input_shape=(224,224,3),
             alpha=1.0,
             include_top=True,
             weights="imagenet",
             input_tensor=None,
             pooling=None,
             classes=1000,
             classifier_activation="softmax",)

batch_size = 1
processed_image = tf.random.uniform(shape=[batch_size,224,224,3])
func = K.function(full_model.layers[1].input, 
full_model.layers[155].output)
conv_output = func([processed_image])
y_pred = np.single(conv_output)

# Generates an integer between 0 and 999 representing a class index.
y_true = np.random.randint(low = 0, high = 999, size = batch_size)
# [984]
scce = tf.keras.losses.SparseCategoricalCrossentropy() 
scce(y_true, y_pred).numpy()
# y_pred encodes a probability distribution here and the calculated loss is 10.69202

You can experiment with the batch_size to see how everything works. In the example above, I just used a batch_size of 1.

Upvotes: 1

Related Questions