Reputation: 19
Let's say my input to the conv layer is 256 x 256 x 64 and I use 32 filters of 3 x 3, why the output depth is 32 not 64? How does convolution carry out in depth axis?
Upvotes: 1
Views: 1902
Reputation: 299
Output depth is dependent on no of filters in current convolution layer.
when you define a filter then you provide spatial dimension (X, Y) and last dimension of filter is based on input channels.
Since in convolution each filter (3, 3, z) convolves( in simple words element-vise multiplication) with input (Xin,Yin,z) produces (Xout,Yout,1) dimension output. Each filter produces single layer output that's why depth of output is equal to no of filters in that convolution layer.
Note: If you are interested in output size of layer then you can calculate using, This slide explains how convolution works.
X_out=(Xin+2Pad-kernal)/Stride
You may want to watch the video of the lecture by prof Fei-Fei Li. Here is complete Deep learning course by her.
Upvotes: 1
Reputation: 1272
In case of CNN each filter is defined by its length and width (3 x 3). connectivity along the depth axis is always equal to the depth of input.
Taking your example:
you have 32 filters and each filter is of size (3x3). So a neuron, in each filter, will look at the patch of (3x3x64) of input and the number of neurons in each filter layer(without zero padding and stride=1) will be (256-3).
So, Neuron1
in each layer i.e. Layer1
, Layer2
, ....,Layer32
look at the same patch of (3x3x64) of the input. And it makes the output size of the convolutional layer to (253x253x32).
This is how number of filter determines the Depth of CNN.
To get more clear picture what i have discussed over here you can refer to this link. This explains CNN far more intuitively and mathematically.
Upvotes: 1