Reputation: 344
By convention an image tensor is always 3D : One dimension for its height
, one for its width
and a third one for its color channel
. Its shape looks like (height, width, color)
.
For instance a batch of 128 color images of size 256x256 could be stored in a 4D-tensor of shape (128, 256, 256, 3)
. The color channel represents here RGB colors. Another example with batch of 128 grayscale images stored in a 4D-tensor of shape (128, 256, 256, 1)
. The color could be coded as 8-bit integers.
For the second example, the last dimension is a vector containing only one element. It is then possible to use a 3D-tensor of shape (128, 256, 256,)
instead.
Here comes my question : I would like to know if there is a difference between using a 3D-tensor rather than a 4D-tensor as the training input of a deep-learning framework using keras.
EDIT : My input layer is a conv2D
Upvotes: 6
Views: 4206
Reputation: 405
I you take a look at the Keras documentation of the conv2D
layer here you will see that the shape of the input tensor must be 4D.
conv2D layer input shape
4D tensor with shape:(batch, channels, rows, cols)
ifdata_format
is "channels_first" or 4D tensor with shape:(batch, rows, cols, channels)
ifdata_format
is "channels_last".
So the 4th dimension of the shape is mandatory, even if it is only "1" as for a grayscaled image.
So in fact, it is not a matter of performance gain nor lack of simplicity, it's only the mandatory input argument's shape.
Hope it answers your question.
Upvotes: 3