NeoZoom.lua
NeoZoom.lua

Reputation: 2901

Tensorflow: tf.nn.avg_pool() with 'SAME' padding does not average over padded pixels

I'm trying to understand tf.nn.avg_pool(). I don't know how the first row of the result is [1.0, 1.0, 1.0, 1.0].

img = tf.constant([
    [[0,4], [0,4], [0,4], [0,4]],
    [[1,5], [1,5], [1,5], [1,5]],
    [[2,6], [2,6], [2,6], [2,6]],
    [[3,7], [3,7], [3,7], [3,7]]
], dtype=tf.float32)

pooling2 = tf.nn.avg_pool(img, ksize=[1,4,4,1], strides=[1,1,1,1], padding='SAME')

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print('pooling2.shape: {}'.format(sess.run(pooling2).shape))
    print('pooling2:\n{}'.format(
        sess.run(pooling2).transpose([0,3,1,2]).reshape([2,4,4])             ))

the printed result is

pooling2.shape: (1, 4, 4, 2)
pooling2:
[[[1.  1.  1.  1. ]
  [1.5 1.5 1.5 1.5]
  [2.  2.  2.  2. ]
  [2.5 2.5 2.5 2.5]]

 [[5.  5.  5.  5. ]
  [5.5 5.5 5.5 5.5]
  [6.  6.  6.  6. ]
  [6.5 6.5 6.5 6.5]]]

It seems that it padded one row at the top, and one column at the left, two rows and columns at right&bottom, and then apply the 4x4 window/kernel to the padded result aligned with top-left corner:

_ _ _ _ _ _ _
_ 0 0 0 0 _ _
_ 1 1 1 1 _ _
_ 2 2 2 2 _ _
_ 3 3 3 3 _ _
_ _ _ _ _ _ _
_ _ _ _ _ _ _

zoom in to left-top

_ _ _ _
_ 0 0 0
_ 1 1 1
_ 2 2 2

Why it seems like the reshaped pooling2[0, 0, 0], which is 1 comes from

(0+0+0 + 1+1+1 + 2+2+2) / 9,

why not / 16?

Upvotes: 3

Views: 1683

Answers (1)

P-Gn
P-Gn

Reputation: 24591

Yes, pixels that are padded are not taken into account in the average. So with a 4x4 pooling, results computed in the middle of the image are averaged over 16 values, but values in the corner could use only 9 values if two edges are padded.

You can for example see it here in the source regarding the call to CuDNN, where the option CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING is selected for average padding. CuDNN also proposes CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING, which would take into account padded pixels in the average, but tensorflow does not exposes this option.

This could be a way in which average pooling behaves differently from (strided) convolution, especially for layers with a small spatial extent.

Note that the situation is similar with max pooling: padded pixels are ignored (or equivalently, virtually set to a value of -inf).

import tensorflow as tf

x = -tf.ones((1, 4, 4, 1))
max_pool = tf.nn.max_pool(x, (1, 4, 4, 1), (1, 1, 1, 1), 'SAME')
sess = tf.InteractiveSession()
print(max_pool.eval().squeeze())
# [[-1. -1. -1. -1.]
#  [-1. -1. -1. -1.]
#  [-1. -1. -1. -1.]
#  [-1. -1. -1. -1.]]

Clearly the documentation could be more explicit about it.

Upvotes: 4

Related Questions