aerin
aerin

Reputation: 22724

Effect of max_pool in Convolutional Neural Network [tensorflow]

I'm following Udacity Deep Learning video by Vincent Vanhoucke and trying to understand the (practical or intuitive or obvious) effect of max pooling.

Let's say my current model (without pooling) uses convolutions with stride 2 to reduce the dimensionality.

  def model(data):
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer2_biases)
    shape = hidden.get_shape().as_list()
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases

Now I introduced pooling: Replace the strides by a max pooling operation (nn.max_pool()) of stride 2 and kernel size 2.

  def model(data):
    conv1 = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
    bias1 = tf.nn.relu(conv1 + layer1_biases)
    pool1 = tf.nn.max_pool(bias1, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
    conv2 = tf.nn.conv2d(pool1, layer2_weights, [1, 1, 1, 1], padding='SAME')
    bias2 = tf.nn.relu(conv2 + layer2_biases)
    pool2 = tf.nn.max_pool(bias2, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
    shape = pool2.get_shape().as_list()
    reshape = tf.reshape(pool2, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases

What would be the compelling reason that we use the later model instead of no-pool model, besides the improved accuracy? Would love to have some insights from people who have already used cnn many times!

Upvotes: 0

Views: 1909

Answers (2)

nessuno
nessuno

Reputation: 27070

In a classification task improving the accuracy is the goal.

However, pooling allows you to:

  1. Reduce the input dimensionality
  2. Force the network to learn particular features, depending on the type of pooling you apply.

Reducing the input dimensionality is something you want because it forces the network to project its learned representations in a different and with lower dimensionality space. This is good computationally speaking because you have to allocate less memory and thus you can have bigger batches. But it's also something desirable because usually high-dimensional spaces have a lot of redundancy and are spaces in which all abjects appears to be sparse and dissimilar ( see The curse of dimensionality )

The function you decide to use for the pooling operation, moreover, can force the network to give more importance to some features.

Max-pooling, for instance, is widely used because allow the network to be robust to small variations of the input image.

What happens, in practice, it that only the features with the highest activations pass through the max-pooling gate. If the input image is shifted by a small amount, then the max-pooling op produces the same output although the input is shifted (the maximum shift is thus equal to the kernel size).

CNN without pooling are also capable of learning this kind of features, but with a bigger cost in term of parameters and computing time (see Striving for Simplicity: The All Convolutional Net)

Upvotes: 1

Salvador Dali
Salvador Dali

Reputation: 222889

Both of the approaches (strides and pooling) reduces the dimensionality of the input (for strides/pooling size > 1). This by itself is a good thing because it reduces the computation time, number of parameters and allows to prevent overfitting.

They achieve it in a different way:

You also mentioned "besides the improved accuracy". But almost everything people do in machine learning is to improve the accuracy (some other loss function). So if tomorrow someone will show that sum-square-root pooling achieves the best result on many bechmarks, a lot of people will start to use it.

Upvotes: 5

Related Questions