Difference between 3D-tensor and 4D-tensor for images input of DL Keras framework

Question

By convention an image tensor is always 3D : One dimension for its height, one for its width and a third one for its color channel. Its shape looks like (height, width, color).

For instance a batch of 128 color images of size 256x256 could be stored in a 4D-tensor of shape (128, 256, 256, 3). The color channel represents here RGB colors. Another example with batch of 128 grayscale images stored in a 4D-tensor of shape (128, 256, 256, 1). The color could be coded as 8-bit integers.

For the second example, the last dimension is a vector containing only one element. It is then possible to use a 3D-tensor of shape (128, 256, 256,) instead.

Here comes my question : I would like to know if there is a difference between using a 3D-tensor rather than a 4D-tensor as the training input of a deep-learning framework using keras.

EDIT : My input layer is a conv2D

Gabriel Cretin · Accepted Answer

I you take a look at the Keras documentation of the conv2D layer here you will see that the shape of the input tensor must be 4D.

conv2D layer input shape
4D tensor with shape: (batch, channels, rows, cols) if data_format is "channels_first" or 4D tensor with shape: (batch, rows, cols, channels) if data_format is "channels_last".

So the 4th dimension of the shape is mandatory, even if it is only "1" as for a grayscaled image.
So in fact, it is not a matter of performance gain nor lack of simplicity, it's only the mandatory input argument's shape.
Hope it answers your question.

Difference between 3D-tensor and 4D-tensor for images input of DL Keras framework

Answers (1)

Related Questions