Fractional Max Pooling in Tensorflow

Question

When using the function tf.nn.fractional_max_pool in Tensorflow, in addition to the output pooled tensor it returns, it also returns a row_pooling_sequence and a col_pooling_sequence, which I presume is used in backpropagation to find the gradient of. This is in contrast to the normal $2 imes 2$ max pooling, which just returns the pooled tensor.

My question is: do we have to handle the row_pooling and col_pooling values ourselves? How would we include them into a network to get backpropagation working properly? I modified a simple convolutional neural network to use fractional max pooling instead of 2 x 2 max pooling without making use of these values and the results were much poorer, leading me to believe we must explicitly handle these.

Here's the relevant portion of my code that makes use of the FMP:

def add_layer_ops_FMP(conv_func, x_input, W, keep_prob_layer, training_phase):

    h_conv = conv_func(x_input, W, stride_l = 1)
    h_BN = batch_norm(h_conv, training_phase, epsilon)
    h_elu = tf.nn.elu(h_BN) # Rectified unit layer - change accordingly

    def dropout_no_training(h_elu=h_elu):
        return dropout_op(h_elu, keep_prob = 1.0)

    def dropout_in_training(h_elu=h_elu, keep_prob_layer=keep_prob_layer):
        return dropout_op(h_elu, keep_prob = keep_prob_layer)

    h_drop = tf.cond(training_phase, dropout_in_training, dropout_no_training)
    h_pool, row_pooling_sequence, col_pooling_sequence = tf.nn.fractional_max_pool(h_drop) # FMP layer. See Ben Graham's paper 

    return h_pool

Link to function on github.

Zhongyu Kuang · Accepted Answer

Do we need to handle row_pooling_sequence and col_pooling_sequence?

Even though the tf.nn.fractional_max_pool documentation says it turns 2 extra tensors which are needed to calculate gradient, I believe we do not need to specially handle these 2 extra tensors and add them into gradient calculation operation. The backpropagation of tf.nn.fractional_max_poolin TensorFlow is already registered into the gradient calculation flow by the _FractionalMaxPoolGrad function. As you can see in the _FractionalMaxPoolGrad, the row_pooling_sequence and col_pooling_sequence are extracted by op.outputs[1] and op.outputs[2] and used to calculate gradient.

@ops.RegisterGradient("FractionalMaxPool")
def _FractionalMaxPoolGrad(op, grad_0, unused_grad_1, unused_grad_2):
  """..."""
  return gen_nn_ops._fractional_max_pool_grad(op.inputs[0], op.outputs[0],
                                              grad_0, op.outputs[1],
                                              op.outputs[2],
                                              op.get_attr("overlapping"))

Possible reasons for poorer performance after using fractional_max_pool (in my personal opinions).

In the fractional max pooling paper, the author used fractional max pooling in a spatially-sparse convolutional network. According to his spatially-sparse convolutional network design, he actually extended the image input spatial size by padding zeros. Additionally, fractional max pooling downsizes the input by a factor of pooling_ratio which is often less than 2. These two combined together allowed stacking more convolutional layers than using regular max pooling and hence building a deeper network. (i.e. imagine using CIFAR-10 dataset, the (non-padding) input spatial size is 32x32, the spatial size drops to 4x4 after 3 convolutional layers and 3 max pooling operations. If using fractional max pooling with pooling_ratio=1.4, the spatial size drops to 4x4 after 6 convolutional and 6 fractional max pooling layers). I experimented with building a CNN with 2-conv-layer+2-pooling-layer(regular max pool vs. fractional max pool with pooling_ratio=1.47)+2-fully-connected-layer on MNIST dataset. The one using regular max pooling also produced a better performance than the one using fractional max pooling (down by 15~20% performance). Comparing the spatial size before feeding into fully connected layers, the model with regular max pooling has spatial size of 7x7, the one with fractional max pooling has spatial size of 12x12. Adding one more conv+fractional_max_pool into the latter model (final spatial size dropped to be 8x8) improved the performance to a more comparative level with the former model with regular max pooling.

In summary, I personally think the good performance in the Fractional Max-Pooling paper is achieved by a combination of using spatially-sparse CNN with fractional max-pooling and small filters (and network in network) which enable building a deep network even when the input image spatial size is small. Hence in regular CNN network, simply replace regular max pooling with fractional max pooling does not necessarily give you a better performance.

Fractional Max Pooling in Tensorflow

Answers (1)

Related Questions