Why is the conv2d layer requiring a ndim=4 input?

Question

I want to use a 2d Convolutional layer in my network and as input I would like to give it pictures. So I have a batch of pictures which mean a ndim=3 matrix, like this for exemple :

dimension of my input:

[10, 6, 7]

The 10 value is the batch size and the two others values are the image size. So what is the fourth dimension the conv 2d layer is requiring ?

Here the interesting lines of code :

self.state_size = [6, 7]

self.inputs_    = tf.placeholder(tf.float32, shape=[None, *self.state_size],  name="inputs_")


# Conv2D layer 1
self.conv1   = tf.layers.conv2d(inputs = self.inputs_,
                                filters = 4,
                                kernel_size = [4, 4],
                                strides = [1, 1],
                      kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d())

Here the error I get :

Input 0 of layer conv2d_1 is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [None, 6, 7]*

Mitiku · Accepted Answer

Here is short explanation about the dimensions of the input tensor to Convolutional 2D layer.

 tensor_shape = (BATCH_SIZE, WIDTH, HEIGHT, CHANNELS).

The fourth dimension is channels(color) dimension.

Long answer would be: Convolutional 2D layer expects the input to have four dimensions. There are two image tensor formats in tensorflow .
1. channels_last(NHWC) - Dimensions are ordered as (BATCH_SIZE, HEIGHT, WIDTH, CHANNEL).
2. channels_first(NCHW) - Dimensions are ordered as BATCH_SIZE, CHANNELS, HEIGHT, WIDTH).

Batch Size dimension

In tensorflow(possibly in other machine learning libraries) once you have defined your model, you have two options to feed data to your model. The first options is feeding the data points one at a time. The second options is feed N number of data points at time to your model. This is possible becuase of the Batch size dimension

Width dimension

This dimension specifies the width of the image.

Height dimension

This dimension specifies Height of the image

Channels dimension

The channel dimension in RGB image is the RGB values dimension.

EDIT:

To specify the dataformat of your input images tensor conv2d layer accepts data_format argument.The default is "channels_last". You can find more here. The following code shows input with channals_last data format

inputs_ = tf.placeholder(tf.float32, [None, 32, 32, 3])

conv1 = tf.layers.conv2d(inputs_,32, (3, 3), data_format="channals_last")

for channels first

conv1 = tf.layers.conv2d(inputs_,32, (3, 3), data_format="channels_first")