Aaron Robeson
Aaron Robeson

Reputation: 312

Why does tensorflow's max_pooling2d need input of rank 4?

I have data of shape [batch_size, x, y] and I want to pass it through a max-pooling layer to pool 2D sections in the x and y plane leaving me with a 2D matrix of those maximum value vectors for each element in batch_size. But the tensorflow's layers.max_pooling2d requires the input to have 4 dimensions. Is the only way around this to expand the dimensions of each example to have a 'dummy' 4th dimension? Because doing that is causing issues later in my model.

Upvotes: 1

Views: 156

Answers (1)

Alex Alifimoff
Alex Alifimoff

Reputation: 1849

The max pooling layer is built to be used with a 2-dimensional image, but with some number of channels. That's why the documentation says the input shape is "[batch_size, height, width, channels] if data_format is NHWC, and [batch_size, channels, height, width] if data_format is NCHW".

Channels in a typical 3-channel image would be the red, blue and green components, or red, green, blue, and alpha in a 4-channel image. In this case you should probably just expand the dimensions and use a single channel (or revisit your data processing pipeline if you intend to have more than one channel).

Obviously max-pooling is more general than to be used just for images, but I believe much work with them has been done in image processing so the interface is likely an artifact of the common use case.

If you need to remove the extra dimension after using the max pooling layer, just reshape your data again.

Upvotes: 1

Related Questions