Timi
Timi

Reputation: 892

Confusion about tensorflow's max_pooling function

I found this information in tensorflow's doc:

tf.layers.max_pooling1d?
Max Pooling layer for 1D inputs.

Arguments:
   inputs: The tensor over which to pool. Must have rank 3.

And:

tf.layers.max_pooling2d?

Max pooling layer for 2D inputs (e.g. images).

Arguments:
   inputs: The tensor over which to pool. Must have rank 4.

My confusion is why the inputs require rank 3 and rank 4, respectively?

Upvotes: 0

Views: 175

Answers (1)

spettekaka
spettekaka

Reputation: 531

What might cause your confusion is the fact that one rank corresponds to the channels.

For 2D inputs (let's say images), the 4 ranks correspond to the following:

  • N refers to the number of images in a batch.
  • H refers to the number of pixels in the vertical (height) dimension.
  • W refers to the number of pixels in the horizontal (width) dimension.
  • C refers to the channels. For example, 1 for black and white or grayscale and 3 for RGB.

Depending on whether you want to have channels_first or channels_last, the ranks are ordered NCHW or NHWC, respectively.

For 1D-inputs there is only one of H or W (I prefer thinking about it as W but that's up to you) and so you have NCW (channels_first) or NWC (channels_last).

For more information about how the ordering (channels_first or channels_last) can affect the computation speed, you might want to take a look at the TensorFlow Performance Guide where I got the above information from.

Upvotes: 2

Related Questions