Why do my model predictions show zero variance for multiple predictions when using Monte-Carlo dropout?

Question

I am working on an image segmentation CNN using Keras (pytorch backend if that matters). I am basing my code off the UNET segmentation code (here which utilizes Monte-Carlo dropout during predictions to approximate uncertainty in the model prediction. I have trained my model using no dropout and exported the weights to be re-imported at prediction time. When I run multiple predictions on the same image, I would expect that each image would be slightly different sue to the 50% dropout rate on the inner layers however my output is exactly the same for the 0th and 20th prediction on the same image.

Is there a way to verify the dropout is actually triggering? Another thought I have had is that load the model is overwriting my reconfigured model with dropout turned on however I have switch to model.load_weights() as opposed to load model to avoid this to no avail.

For reference, my toy data set (self-generated) are 256 x 512 greyscale images and I am attempting to use a 50% dropout rate at during prediction.

filepath is the .keras file with the pre-computed weights (no MCD) from the training set.

drop_rate = 0.5
drop_train = True  # MC dropout at inference
# drop_train=False #normal (no) dropout at inference
# downsize the UNET for this example.
# the smaller network is faster to train
# and produces excellent results on the dataset at hand
nfilters = (N_filters / 8).astype("int")

# input
input_tensor = Input(shape=frames_test_set.shape[1:], name="input_tensor")

## Encoder
# Encoder block 0
e0 = Conv2D(filters=nfilters[0], kernel_size=(3, 3), padding="same")(input_tensor)
e0 = BatchNormalization(axis=batch_normalization_axis)(e0)
e0 = Activation("relu")(e0)
e0 = Conv2D(filters=nfilters[0], kernel_size=(3, 3), padding="same")(e0)
e0 = BatchNormalization(axis=batch_normalization_axis)(e0)
e0 = Activation("relu")(e0)

# Encoder block 1
e1 = MaxPooling2D((2, 2))(e0)
e1 = Conv2D(filters=nfilters[1], kernel_size=(3, 3), padding="same")(e1)
e1 = BatchNormalization(axis=batch_normalization_axis)(e1)
e1 = Activation("relu")(e1)
e1 = Conv2D(filters=nfilters[1], kernel_size=(3, 3), padding="same")(e1)
e1 = BatchNormalization(axis=batch_normalization_axis)(e1)
e1 = Activation("relu")(e1)

# Encoder block 2
e2 = Dropout(drop_rate)(e1, training=drop_train)
e2 = MaxPooling2D((2, 2))(e2)
e2 = Conv2D(filters=nfilters[2], kernel_size=(3, 3), padding="same")(e2)
e2 = BatchNormalization(axis=batch_normalization_axis)(e2)
e2 = Activation("relu")(e2)
e2 = Conv2D(filters=nfilters[2], kernel_size=(3, 3), padding="same")(e2)
e2 = BatchNormalization(axis=batch_normalization_axis)(e2)
e2 = Activation("relu")(e2)

# Encoder block 3
e3 = Dropout(drop_rate)(e2, training=drop_train)
e3 = MaxPooling2D((2, 2))(e3)
e3 = Conv2D(filters=nfilters[3], kernel_size=(3, 3), padding="same")(e3)
e3 = BatchNormalization(axis=batch_normalization_axis)(e3)
e3 = Activation("relu")(e3)
e3 = Conv2D(filters=nfilters[3], kernel_size=(3, 3), padding="same")(e3)
e3 = BatchNormalization(axis=batch_normalization_axis)(e3)
e3 = Activation("relu")(e3)

# Encoder block 4
e4 = Dropout(drop_rate)(e3, training=drop_train)
e4 = MaxPooling2D((2, 2))(e4)
e4 = Conv2D(filters=nfilters[4], kernel_size=(3, 3), padding="same")(e4)
e4 = BatchNormalization(axis=batch_normalization_axis)(e4)
e4 = Activation("relu")(e4)
e4 = Conv2D(filters=nfilters[4], kernel_size=(3, 3), padding="same")(e4)
e4 = BatchNormalization(axis=batch_normalization_axis)(e4)
e4 = Activation("relu")(e4)
# e4 = MaxPooling2D((2, 2))(e4)

## Encoder
# Decoder block 3
d3 = Dropout(drop_rate)(e4, training=drop_train)
d3 = UpSampling2D(
    (2, 2),
)(d3)
d3 = concatenate([e3, d3], axis=-1)  # skip connection
d3 = Conv2DTranspose(nfilters[3], (3, 3), padding="same")(d3)
d3 = BatchNormalization(axis=batch_normalization_axis)(d3)
d3 = Activation("relu")(d3)
d3 = Conv2DTranspose(nfilters[3], (3, 3), padding="same")(d3)
d3 = BatchNormalization(axis=batch_normalization_axis)(d3)
d3 = Activation("relu")(d3)

# Decoder block 2
d2 = Dropout(drop_rate)(d3, training=drop_train)
d2 = UpSampling2D(
    (2, 2),
)(d2)
d2 = concatenate([e2, d2], axis=-1)  # skip connection
d2 = Conv2DTranspose(nfilters[2], (3, 3), padding="same")(d2)
d2 = BatchNormalization(axis=batch_normalization_axis)(d2)
d2 = Activation("relu")(d2)
d2 = Conv2DTranspose(nfilters[2], (3, 3), padding="same")(d2)
d2 = BatchNormalization(axis=batch_normalization_axis)(d2)
d2 = Activation("relu")(d2)

# Decoder block 1
d1 = UpSampling2D(
    (2, 2),
)(d2)
d1 = concatenate([e1, d1], axis=-1)  # skip connection
d1 = Conv2DTranspose(nfilters[1], (3, 3), padding="same")(d1)
d1 = BatchNormalization(axis=batch_normalization_axis)(d1)
d1 = Activation("relu")(d1)
d1 = Conv2DTranspose(nfilters[1], (3, 3), padding="same")(d1)
d1 = BatchNormalization(axis=batch_normalization_axis)(d1)
d1 = Activation("relu")(d1)

# Decoder block 0
d0 = UpSampling2D(
    (2, 2),
)(d1)
d0 = concatenate([e0, d0], axis=-1)  # skip connection
d0 = Conv2DTranspose(nfilters[0], (3, 3), padding="same")(d0)
d0 = BatchNormalization(axis=batch_normalization_axis)(d0)
d0 = Activation("relu")(d0)
d0 = Conv2DTranspose(nfilters[0], (3, 3), padding="same")(d0)
d0 = BatchNormalization(axis=batch_normalization_axis)(d0)
d0 = Activation("relu")(d0)

# output
# out_class = Dense(1)(d0)
out_class = Conv2D(1, (1, 1), padding="same")(d0)
out_class = Activation("sigmoid", name="output")(out_class)

# create and compile the model
model = Model(inputs=input_tensor, outputs=out_class)
model.compile(
    loss={"output": "binary_crossentropy"},
    metrics={"output": "accuracy"},
    optimizer="adam",
)

model.load_weights(f"{filepath}.keras")
Y_ts_hat = model.predict(frames_test_set, batch_size=1)

T = 10

Y_ts_hat_variance = np.zeros(
    (Y_ts_hat.shape[0], Y_ts_hat.shape[1], Y_ts_hat.shape[2], 1, T)
)

Y_ts_hat_variance[:, :, :, :, 0] = Y_ts_hat

for t in range(T - 1):
    print(f"Model {t+1}/{T-1}")
    Y_ts_hat_variance[:, :, :, :, t + 1] = model.predict(frames_test_set, batch_size=1)

arrays_indentical = (
    Y_ts_hat_variance[25, :, :, 0, 0] == Y_ts_hat_variance[25, :, :, 0, -1]
).all()
print(f"Arrays identical: {arrays_indentical}")

I have been running the model through 20 predictions and checking the same result image for the first and last prediction to see if they are identical. If there were some randomness involved due to MCD I would expect this to return false however through all my testing they are always identical.

Why do my model predictions show zero variance for multiple predictions when using Monte-Carlo dropout?

Answers (1)

Related Questions