Reputation: 22724
I'm following Udacity Deep Learning video by Vincent Vanhoucke and trying to understand the (practical or intuitive or obvious) effect of max pooling.
Let's say my current model (without pooling) uses convolutions with stride 2 to reduce the dimensionality.
def model(data):
conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv + layer1_biases)
conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv + layer2_biases)
shape = hidden.get_shape().as_list()
reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
return tf.matmul(hidden, layer4_weights) + layer4_biases
Now I introduced pooling: Replace the strides by a max pooling operation (nn.max_pool()) of stride 2 and kernel size 2.
def model(data):
conv1 = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
bias1 = tf.nn.relu(conv1 + layer1_biases)
pool1 = tf.nn.max_pool(bias1, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
conv2 = tf.nn.conv2d(pool1, layer2_weights, [1, 1, 1, 1], padding='SAME')
bias2 = tf.nn.relu(conv2 + layer2_biases)
pool2 = tf.nn.max_pool(bias2, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
shape = pool2.get_shape().as_list()
reshape = tf.reshape(pool2, [shape[0], shape[1] * shape[2] * shape[3]])
hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
return tf.matmul(hidden, layer4_weights) + layer4_biases
What would be the compelling reason that we use the later model instead of no-pool model, besides the improved accuracy? Would love to have some insights from people who have already used cnn many times!
Upvotes: 0
Views: 1909
Reputation: 27070
In a classification task improving the accuracy is the goal.
However, pooling allows you to:
Reducing the input dimensionality is something you want because it forces the network to project its learned representations in a different and with lower dimensionality space. This is good computationally speaking because you have to allocate less memory and thus you can have bigger batches. But it's also something desirable because usually high-dimensional spaces have a lot of redundancy and are spaces in which all abjects appears to be sparse and dissimilar ( see The curse of dimensionality )
The function you decide to use for the pooling operation, moreover, can force the network to give more importance to some features.
Max-pooling, for instance, is widely used because allow the network to be robust to small variations of the input image.
What happens, in practice, it that only the features with the highest activations pass through the max-pooling gate. If the input image is shifted by a small amount, then the max-pooling op produces the same output although the input is shifted (the maximum shift is thus equal to the kernel size).
CNN without pooling are also capable of learning this kind of features, but with a bigger cost in term of parameters and computing time (see Striving for Simplicity: The All Convolutional Net)
Upvotes: 1
Reputation: 222889
Both of the approaches (strides and pooling) reduces the dimensionality of the input (for strides/pooling size > 1). This by itself is a good thing because it reduces the computation time, number of parameters and allows to prevent overfitting.
They achieve it in a different way:
You also mentioned "besides the improved accuracy". But almost everything people do in machine learning is to improve the accuracy (some other loss function). So if tomorrow someone will show that sum-square-root pooling achieves the best result on many bechmarks, a lot of people will start to use it.
Upvotes: 5