What is the desired behavior of average pooling with padding?

Question

Recently I've trained a neural network using pytorch and there is an average pooling layer with padding in it. And I'm confused about the behavior of it as well as the definition of average pooling with padding.

For example, if we have a input tensor:

[[1, 2, 3],
 [4, 5, 6],
 [7, 8, 9]]

When padding is one and kernel size 3, the input to the first kernel should be:

 0, 0, 0
 0, 1, 2
 0, 4, 5

The output from the pytorch is 12/4 = 3 (ignoring padded 0)， but I think it should be 12/9 = 1.333

Can anyone explain this to me?

Much appreciated.

Shai · Accepted Answer

It's basically up to you to decide how you want your padded pooling layer to behave.
This is why pytorch's avg pool (e.g., nn.AvgPool2d) has an optional parameter count_include_pad=True:
By default (True) Avg pool will first pad the input and then treat all elements the same. In this case the output of your example would indeed be 1.33.
On the other hand, if you set count_include_pad=False the pooling layer will ignore the padded elements and the result in your example would be 3.

What is the desired behavior of average pooling with padding?

Answers (1)

Related Questions