Reputation: 2901
I'm trying to understand tf.nn.avg_pool()
. I don't know how the first row of the result is [1.0, 1.0, 1.0, 1.0]
.
img = tf.constant([
[[0,4], [0,4], [0,4], [0,4]],
[[1,5], [1,5], [1,5], [1,5]],
[[2,6], [2,6], [2,6], [2,6]],
[[3,7], [3,7], [3,7], [3,7]]
], dtype=tf.float32)
pooling2 = tf.nn.avg_pool(img, ksize=[1,4,4,1], strides=[1,1,1,1], padding='SAME')
with tf.Session() as sess:
tf.global_variables_initializer().run()
print('pooling2.shape: {}'.format(sess.run(pooling2).shape))
print('pooling2:\n{}'.format(
sess.run(pooling2).transpose([0,3,1,2]).reshape([2,4,4]) ))
the printed result is
pooling2.shape: (1, 4, 4, 2)
pooling2:
[[[1. 1. 1. 1. ]
[1.5 1.5 1.5 1.5]
[2. 2. 2. 2. ]
[2.5 2.5 2.5 2.5]]
[[5. 5. 5. 5. ]
[5.5 5.5 5.5 5.5]
[6. 6. 6. 6. ]
[6.5 6.5 6.5 6.5]]]
It seems that it padded one row at the top, and one column at the left, two rows and columns at right&bottom, and then apply the 4x4 window/kernel to the padded result aligned with top-left corner:
_ _ _ _ _ _ _
_ 0 0 0 0 _ _
_ 1 1 1 1 _ _
_ 2 2 2 2 _ _
_ 3 3 3 3 _ _
_ _ _ _ _ _ _
_ _ _ _ _ _ _
zoom in to left-top
_ _ _ _
_ 0 0 0
_ 1 1 1
_ 2 2 2
Why it seems like the reshaped pooling2[0, 0, 0]
, which is 1
comes from
(0+0+0 + 1+1+1 + 2+2+2) / 9,
why not / 16
?
Upvotes: 3
Views: 1683
Reputation: 24591
Yes, pixels that are padded are not taken into account in the average. So with a 4x4
pooling, results computed in the middle of the image are averaged over 16 values, but values in the corner could use only 9
values if two edges are padded.
You can for example see it here in the source regarding the call to CuDNN, where the option CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING
is selected for average padding. CuDNN also proposes CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING
, which would take into account padded pixels in the average, but tensorflow does not exposes this option.
This could be a way in which average pooling behaves differently from (strided) convolution, especially for layers with a small spatial extent.
Note that the situation is similar with max pooling: padded pixels are ignored (or equivalently, virtually set to a value of -inf
).
import tensorflow as tf
x = -tf.ones((1, 4, 4, 1))
max_pool = tf.nn.max_pool(x, (1, 4, 4, 1), (1, 1, 1, 1), 'SAME')
sess = tf.InteractiveSession()
print(max_pool.eval().squeeze())
# [[-1. -1. -1. -1.]
# [-1. -1. -1. -1.]
# [-1. -1. -1. -1.]
# [-1. -1. -1. -1.]]
Clearly the documentation could be more explicit about it.
Upvotes: 4