How to understand model loss output and dice coef

Question

I'm using this:

Python version: 3.7.7 (default, May  6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)]
TensorFlow version: 2.1.0
Eager execution: True

With this U-Net model:

inputs = Input(shape=img_shape)

    conv1 = Conv2D(64, (5, 5), activation='relu', padding='same', data_format="channels_last", name='conv1_1')(inputs)
    conv1 = Conv2D(64, (5, 5), activation='relu', padding='same', data_format="channels_last", name='conv1_2')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool1')(conv1)
    conv2 = Conv2D(96, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv2_1')(pool1)
    conv2 = Conv2D(96, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv2_2')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool2')(conv2)

    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv3_1')(pool2)
    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv3_2')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool3')(conv3)

    conv4 = Conv2D(256, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv4_1')(pool3)
    conv4 = Conv2D(256, (4, 4), activation='relu', padding='same', data_format="channels_last", name='conv4_2')(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool4')(conv4)

    conv5 = Conv2D(512, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv5_1')(pool4)
    conv5 = Conv2D(512, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv5_2')(conv5)

    up_conv5 = UpSampling2D(size=(2, 2), data_format="channels_last", name='up_conv5')(conv5)
    ch, cw = get_crop_shape(conv4, up_conv5)
    crop_conv4 = Cropping2D(cropping=(ch, cw), data_format="channels_last", name='crop_conv4')(conv4)
    up6 = concatenate([up_conv5, crop_conv4])
    conv6 = Conv2D(256, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv6_1')(up6)
    conv6 = Conv2D(256, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv6_2')(conv6)

    up_conv6 = UpSampling2D(size=(2, 2), data_format="channels_last", name='up_conv6')(conv6)
    ch, cw = get_crop_shape(conv3, up_conv6)
    crop_conv3 = Cropping2D(cropping=(ch, cw), data_format="channels_last", name='crop_conv3')(conv3)
    up7 = concatenate([up_conv6, crop_conv3])
    conv7 = Conv2D(128, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv7_1')(up7)
    conv7 = Conv2D(128, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv7_2')(conv7)

    up_conv7 = UpSampling2D(size=(2, 2), data_format="channels_last", name='up_conv7')(conv7)
    ch, cw = get_crop_shape(conv2, up_conv7)
    crop_conv2 = Cropping2D(cropping=(ch, cw), data_format="channels_last", name='crop_conv2')(conv2)
    up8 = concatenate([up_conv7, crop_conv2])
    conv8 = Conv2D(96, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv8_1')(up8)
    conv8 = Conv2D(96, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv8_2')(conv8)

    up_conv8 = UpSampling2D(size=(2, 2), data_format="channels_last", name='up_conv8')(conv8)
    ch, cw = get_crop_shape(conv1, up_conv8)
    crop_conv1 = Cropping2D(cropping=(ch, cw), data_format="channels_last", name='crop_conv1')(conv1)
    up9 = concatenate([up_conv8, crop_conv1])
    conv9 = Conv2D(64, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv9_1')(up9)
    conv9 = Conv2D(64, (3, 3), activation='relu', padding='same', data_format="channels_last", name='conv9_2')(conv9)

    ch, cw = get_crop_shape(inputs, conv9)
    conv9 = ZeroPadding2D(padding=(ch, cw), data_format="channels_last", name='conv9_3')(conv9)
    conv10 = Conv2D(1, (1, 1), activation='sigmoid', data_format="channels_last", name='conv10_1')(conv9)
    model = Model(inputs=inputs, outputs=conv10)

And with this functions:

def dice_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2.0 * intersection + 1.0) / (K.sum(y_true_f) + K.sum(y_pred_f) + 1.0)

def dice_coef_loss(y_true, y_pred):
    return 1-dice_coef(y_true, y_pred)

To compile the model I do:

model.compile(tf.keras.optimizers.Adam(lr=(1e-4) * 2), loss=dice_coef_loss, metrics=[dice_coef])

And I get this output while training:

Epoch 1/2
5/5 [==============================] - 8s 2s/sample - loss: 1.0000 - dice_coef: 4.5962e-05 - val_loss: 0.9929 - val_dice_coef: 0.0071
Epoch 2/2
5/5 [==============================] - 5s 977ms/sample - loss: 0.9703 - dice_coef: 0.0297 - val_loss: 0.9939 - val_dice_coef: 0.0061
Train on 5 samples, validate on 5 samples

I think the idea is to get a loss closely to zero, but I don't understand the 1.000 I get (maybe it is worst loss value that I can get). But I don't understand the dice_coef value.

What does dice_coef value mean?

John C. · Accepted Answer

Dice loss is a loss function that prevents some of the limitations present in the ordinary Cross Entropy loss.

Limitations of Cross Entropy:

When using cross entropy loss, the statistical distributions of labels play a big role in training accuracy. The more unbalanced the label distributions are, the more difficult the training will be. Although weighted cross entropy loss can alleviate the difficulty, the improvement is not significant nor the intrinsic issue of cross entropy loss is solved. In cross entropy loss, the loss is calculated as the average of per-pixel loss, and the per-pixel loss is calculated discretely, without knowing whether its adjacent pixels are boundaries or not. As a result, cross entropy loss only considers loss in a micro sense rather than considering it globally, which is not enough for image level prediction.

Dice Loss

Dice Coef function can be described as:

which is clearly what your funtion dice_coef(y_true, y_pred) is calculating. More about that Sørensen–Dice coefficient

In the equation above p_i and g_i are pairs of corresponding pixel values of prediction and ground truth, respectively. In boundary detection scenario, their values are either 0 or 1, representing whether the pixel is boundary (value of 1) or not (value of 0). The denominator is the sum of total boundary pixels of both prediction and ground truth, and the numerator is the sum of correctly predicted boundary pixels because the sum increments only when pi and gi match (both of value 1).

The denominator considers the total number of boundary pixels at global scale, while the numerator considers the overlap between the two sets at local scale. Therefore, Dice loss considers the loss information both locally and globally, which is critical for high accuracy.

Regarding your training, since your loss value is decreasing through the training, you shouldn't worry that much, try increasing the number of epoch and analyse the network as it goes through the model.

the dice loss is simply 1 - dice coef. which is what your function is calculating.

How to understand model loss output and dice coef

Answers (1)

Limitations of Cross Entropy:

Dice Loss

Related Questions