Reputation: 609
I want to use a 2d Convolutional layer in my network and as input I would like to give it pictures. So I have a batch of pictures which mean a ndim=3 matrix, like this for exemple :
dimension of my input:
[10, 6, 7]
The 10
value is the batch size
and the two others values are the image size. So what is the fourth dimension the conv 2d layer is requiring ?
Here the interesting lines of code :
self.state_size = [6, 7]
self.inputs_ = tf.placeholder(tf.float32, shape=[None, *self.state_size], name="inputs_")
# Conv2D layer 1
self.conv1 = tf.layers.conv2d(inputs = self.inputs_,
filters = 4,
kernel_size = [4, 4],
strides = [1, 1],
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d())
Here the error I get :
Input 0 of layer conv2d_1 is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [None, 6, 7]*
Upvotes: 5
Views: 2773
Reputation: 5412
Here is short explanation about the dimensions of the input tensor to Convolutional 2D layer.
tensor_shape = (BATCH_SIZE, WIDTH, HEIGHT, CHANNELS).
The fourth dimension is channels(color) dimension.
Long answer would be:
Convolutional 2D layer expects the input to have four dimensions. There are two image tensor formats in tensorflow .
1. channels_last(NHWC) - Dimensions are ordered as (BATCH_SIZE, HEIGHT, WIDTH, CHANNEL)
.
2. channels_first(NCHW) - Dimensions are ordered as BATCH_SIZE, CHANNELS, HEIGHT, WIDTH)
.
In tensorflow(possibly in other machine learning libraries) once you have defined your model, you have two options to feed data to your model. The first options is feeding the data points one at a time. The second options is feed N
number of data points at time to your model. This is possible becuase of the Batch size dimension
This dimension specifies the width of the image.
This dimension specifies Height of the image
The channel dimension in RGB image is the RGB values dimension.
To specify the dataformat of your input images tensor conv2d layer accepts data_format argument.The default is "channels_last". You can find more here. The following code shows input with channals_last data format
inputs_ = tf.placeholder(tf.float32, [None, 32, 32, 3])
conv1 = tf.layers.conv2d(inputs_,32, (3, 3), data_format="channals_last")
for channels first
conv1 = tf.layers.conv2d(inputs_,32, (3, 3), data_format="channels_first")
Upvotes: 6