Confusion about tensorflow's max_pooling function

Question

I found this information in tensorflow's doc:

tf.layers.max_pooling1d?
Max Pooling layer for 1D inputs.

Arguments:
   inputs: The tensor over which to pool. Must have rank 3.

And:

tf.layers.max_pooling2d?

Max pooling layer for 2D inputs (e.g. images).

Arguments:
   inputs: The tensor over which to pool. Must have rank 4.

My confusion is why the inputs require rank 3 and rank 4, respectively?

spettekaka · Accepted Answer

What might cause your confusion is the fact that one rank corresponds to the channels.

For 2D inputs (let's say images), the 4 ranks correspond to the following:

N refers to the number of images in a batch.
H refers to the number of pixels in the vertical (height) dimension.
W refers to the number of pixels in the horizontal (width) dimension.
C refers to the channels. For example, 1 for black and white or grayscale and 3 for RGB.

Depending on whether you want to have channels_first or channels_last, the ranks are ordered NCHW or NHWC, respectively.

For 1D-inputs there is only one of H or W (I prefer thinking about it as W but that's up to you) and so you have NCW (channels_first) or NWC (channels_last).

For more information about how the ordering (channels_first or channels_last) can affect the computation speed, you might want to take a look at the TensorFlow Performance Guide where I got the above information from.

Confusion about tensorflow's max_pooling function

Answers (1)

Related Questions

Confusion about tensorflow&#39;s max_pooling function

Answers (1)

Related Questions

Confusion about tensorflow's max_pooling function