lmz
lmz

Reputation: 386

What is the difference between "Dropout", "Monte-Carlo Dropout" and "Channel-wise Dropout"?

I have come across the above terms and I am unsure about the difference between them.

My understanding is that MC dropout is normal dropout which is also active during test time, allowing us to get an estimate for model uncertainty on multiple test runs. As for channel-wise dropout, I am clueless.

Bonus: What is a simple way to implement MC dropout and channel-wise dropout in Keras?

Upvotes: 6

Views: 3220

Answers (1)

Vlad
Vlad

Reputation: 8595

You are correct that MC Dropout is applied during inference as well, unlike regular dropout. If you google it you could easily find plenty of information regarding both.

Regarding channel-wise dropout, my understanding is that instead of dropping particular neurons, it drops the entire channels.

Now the implementation in Keras (I'm going to use tf.keras).

MC Dropout

As usual Keras you define a custom layer that applies dropout regardless of whether it is training or testing so we can just use tf.nn.dropout() with constant dropout rate:

import tensorflow as tf

class MCDropout(tf.keras.layers.Layer):
    def __init__(self, rate):
        super(MCDropout, self).__init__()
        self.rate = rate

    def call(self, inputs):
        return tf.nn.dropout(inputs, rate=self.rate)

Usage example:

import tensorflow as tf
import numpy as np

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters=6, kernel_size=3))
model.add(MCDropout(rate=0.5))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(2))
model.compile(optimizer=tf.keras.optimizers.SGD(0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

# generate dummy data for illustration
x_train = np.random.normal(size=(10, 4, 4, 3))
x_train = np.vstack([x_train, 2*np.random.normal(size=(10, 4, 4, 3))])
y_train = [[1, 0] for _ in range(10)] + [[0, 1] for _ in range(10)]
y_train = np.array(y_train)

model.fit(x_train,
          y_train,
          epochs=2,
          batch_size=10,
          validation_data=(x_train, y_train))

Channel-Wise Dropout

Here you can use the same tf.nn.dropout() function, however, you have to specify the noise shape. The documentation of tf.nn.dropout() gives the exact example of how to achieve dropped channels:

shape(x) = [k, l, m, n] and noise_shape = [k, 1, 1, n], each batch and channel component will be kept independently and each row and column will be kept or not kept together.

This is what we're going to do in the call() method:

class ChannelWiseDropout(tf.keras.layers.Layer):
    def __init__(self, rate):
        super(ChannelWiseDropout, self).__init__()
        self.rate = rate

    def call(self, inputs):
        shape = tf.keras.backend.shape(inputs)
        noise_shape = (shape[0], 1, 1, shape[-1])
        return tf.nn.dropout(inputs,
                             rate=self.rate,
                             noise_shape=noise_shape)

Applying it to some example:

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=(4, 4, 3)))
model.add(tf.keras.layers.Conv2D(filters=3, kernel_size=3))
model.add(ChannelWiseDropout(rate=0.5))

x_train = np.random.normal(size=(1, 4, 4, 3))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    res = sess.run(model.output, feed_dict={model.inputs[0]:x_train})
    print(res[:, :, :, 0])
    print(res[:, :, :, 1])
    print(res[:, :, :, 2])
# [[[2.5495746  1.3060737 ]
#   [0.47009617 1.0427766 ]]]
# [[[-0.  0.]
#   [-0. -0.]]]                <-- second and third channels were dropped
# [[[-0. -0.]
#   [-0. -0.]]]

Note

I'm using tf.__version__ == '1.13.1'. Older versions of tf use keep_prob = 1 - rate instead of rate argument.

Upvotes: 8

Related Questions