Keras custom loss function issue when using K.sum()

Question

I'm having an issue when writing a custom loss function in keras, specifically when I use K.sum() inside the loss function. To simplify, let's take the below example:

This works fine:

from keras import backend as K

def custom_loss(y_true, y_pred):        
    return K.mean(K.abs(y_pred - y_true))

Now, if I want to normalize y_pred, before evaluating the loss above:

def custom_loss(y_true, y_pred):
    y_pred = y_pred / K.sum(y_pred, axis=-1)
    return K.mean(K.abs(y_pred - y_true))

I'm getting the below error during model.fit_generator()

InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Incompatible shapes: [64,9] vs. [64]
     [[{{node loss_13/dense_2_loss/truediv}}]]
     [[metrics_9/mean_absolute_error/Mean_1/_5003]]
  (1) Invalid argument: Incompatible shapes: [64,9] vs. [64]
     [[{{node loss_13/dense_2_loss/truediv}}]]
0 successful operations.
0 derived errors ignored.

I've seen many questions regarding the Incompatible shapes error but none seemed to be concerned about the usage of K.sum().

I can notice that 64 is the batch size and 9 is the number of classes I have (both y_true and y_pred are expected to be (64, 9)).

I've added some print statements to see what happens during the model.compile() and here are the outputs:

def custom_loss(y_true, y_pred):
    print(f"Shape of y_pred before normalization: {y_pred.shape}")
    y_pred = y_pred / K.sum(y_pred, axis=-1)
    print(f"Shape of y_pred after normalization: {y_pred.shape}")
    return K.mean(K.abs(y_pred - y_true))

Compiling

# compile the model
model.compile(loss=custom_loss, metrics=['mae'], optimizer='Adam')

# outputs
# Shape of y_pred before normalization: (?, 9)
# Shape of y_pred after normalization: (?, 9)

Version Info:

keras                     2.2.4                    
keras-applications        1.0.8                   
keras-preprocessing       1.1.0  
tensorflow-estimator      1.14.0                
tensorflow-gpu            1.14.0

zihaozhihao · Accepted Answer

So basically, K.sum(y_pred, axis=-1) will calculate the sum along the last dim. There is another argument keepdim in K.sum() which is False in default. So after calculating the sum along the last dim, it will squeeze this dim. Because you want to normalize the y_pred, you should keep the last dim (broadcasting related).

ef custom_loss_norm(y_true, y_pred):
    y_pred = y_pred / K.sum(y_pred, axis=-1, keepdims=True)
    return K.mean(K.abs(y_pred - y_true))

Keras custom loss function issue when using K.sum()

Answers (1)

Related Questions