Understanding Channel in Convolution Neural Network (CNN) input shape and output shape

Question

I was trying to follow this tutorial https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

In the baseline model it has model.add(Conv2D(32, (3, 3), input_shape=(3, 150, 150)))

I don't quite follow the output shape here. If input shape is 3x150x150 with a kernel size 3x3, isn't the output shape 3x148x148? (Assuming no padding). However, according to Keras Doc:

Output shape: 4D tensor with shape: (batch, filters, new_rows, new_cols)

That seems to me output shape will be 32x148x148. My question is whether this understanding correct? If so, where do the additional filters come from?

Amir · Accepted Answer

If the input shape is (3, 150, 150), after applying Conv2D layer the output is (?, 32, 148, 148). Check it out with following example:

inps = Input(shape=(3, 150, 150))
conv = Conv2D(32, (3, 3), data_format='channels_first')(inps)
print(conv)
>> Tensor("conv2d/BiasAdd:0", shape=(?, 32, 148, 148), dtype=float32)

The first dimension which specified by ? symbol is batch size.
The second dimension is filter size (32).
The two last are image width and height (148).

How do channels change from 3 to 32? Let's assume we have RGB image (3 channels) and the output channel size is 1. The following things happen:

When you use filters=32 and kernel_size=(3,3), you are creating 32 different filters, each of them with shape (3,3,3). The result will bring 32 different convolutions. Note that, according to Keras, all kernels initialize by glorot_uniform at the beginning.

Image from this blog post.

Understanding Channel in Convolution Neural Network (CNN) input shape and output shape

Answers (1)

Related Questions