Reputation: 386
I have come across the above terms and I am unsure about the difference between them.
My understanding is that MC dropout is normal dropout which is also active during test time, allowing us to get an estimate for model uncertainty on multiple test runs. As for channel-wise dropout, I am clueless.
Bonus: What is a simple way to implement MC dropout and channel-wise dropout in Keras?
Upvotes: 6
Views: 3220
Reputation: 8595
You are correct that MC Dropout is applied during inference as well, unlike regular dropout. If you google it you could easily find plenty of information regarding both.
Regarding channel-wise dropout, my understanding is that instead of dropping particular neurons, it drops the entire channels.
Now the implementation in Keras (I'm going to use tf.keras
).
MC Dropout
As usual Keras you define a custom layer that applies dropout regardless of whether it is training or testing so we can just use tf.nn.dropout()
with constant dropout rate:
import tensorflow as tf
class MCDropout(tf.keras.layers.Layer):
def __init__(self, rate):
super(MCDropout, self).__init__()
self.rate = rate
def call(self, inputs):
return tf.nn.dropout(inputs, rate=self.rate)
Usage example:
import tensorflow as tf
import numpy as np
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters=6, kernel_size=3))
model.add(MCDropout(rate=0.5))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(2))
model.compile(optimizer=tf.keras.optimizers.SGD(0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
# generate dummy data for illustration
x_train = np.random.normal(size=(10, 4, 4, 3))
x_train = np.vstack([x_train, 2*np.random.normal(size=(10, 4, 4, 3))])
y_train = [[1, 0] for _ in range(10)] + [[0, 1] for _ in range(10)]
y_train = np.array(y_train)
model.fit(x_train,
y_train,
epochs=2,
batch_size=10,
validation_data=(x_train, y_train))
Channel-Wise Dropout
Here you can use the same tf.nn.dropout()
function, however, you have to specify the noise shape. The documentation of tf.nn.dropout()
gives the exact example of how to achieve dropped channels:
shape(x) = [k, l, m, n] and noise_shape = [k, 1, 1, n], each batch and channel component will be kept independently and each row and column will be kept or not kept together.
This is what we're going to do in the call()
method:
class ChannelWiseDropout(tf.keras.layers.Layer):
def __init__(self, rate):
super(ChannelWiseDropout, self).__init__()
self.rate = rate
def call(self, inputs):
shape = tf.keras.backend.shape(inputs)
noise_shape = (shape[0], 1, 1, shape[-1])
return tf.nn.dropout(inputs,
rate=self.rate,
noise_shape=noise_shape)
Applying it to some example:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=(4, 4, 3)))
model.add(tf.keras.layers.Conv2D(filters=3, kernel_size=3))
model.add(ChannelWiseDropout(rate=0.5))
x_train = np.random.normal(size=(1, 4, 4, 3))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
res = sess.run(model.output, feed_dict={model.inputs[0]:x_train})
print(res[:, :, :, 0])
print(res[:, :, :, 1])
print(res[:, :, :, 2])
# [[[2.5495746 1.3060737 ]
# [0.47009617 1.0427766 ]]]
# [[[-0. 0.]
# [-0. -0.]]] <-- second and third channels were dropped
# [[[-0. -0.]
# [-0. -0.]]]
Note
I'm using tf.__version__ == '1.13.1'
. Older versions of tf
use keep_prob = 1 - rate
instead of rate
argument.
Upvotes: 8