Reputation: 591
I want to understand why a image with 3 channels like 6*6*3 when convol with 3*3*3 filter produce only 4*4 and not 4*4*3
Upvotes: 0
Views: 1717
Reputation: 2174
One way is to think how many 3 x 3 x 3 cubes you may cut and grab from 6 x 6 x 3 rectangle prism.
Also, let's make the question simpler.
Let's say you have 2 x 2 input image patch and you want to do 2 x 2 convolution. How many ways you can fit 2 x 2 convolution to input image patch = 1 x 1 (1 in x direction, 1 in y direction)
Let's extend it: 4 x 2 input image with 2 x 2 conv. Unique convolution count = 3 x 1 (3 in x direction, 1 in y direction etc.)
Let's extend it: 4 x 4 input image with 2 x 2 conv. Unique convolution count = 3 x 3
Let's extend it: 4 x 4 x 2 input image with 2 x 2 x 2 conv. Unique convolution count = 3 x 3 x 1
Let's extend it: 6 x 6 x 3 input image with 3 x 3 x 3 conv. Unique convolution count = 4 x 4 x 1
Upvotes: 1
Reputation: 433
When you apply a convolution it is summing across the outputs of your kernel weights multiplied by your input values. In this case you are not doing any sort of padding of your input, which means you are only outputting the values which are fully defined. If you take the dimensions of your input as (x,y,z), then you can see that your kernel is smaller in the x and y dimension, but equal in the z. That means that you can slide the kernel in both x and y directions, producing an output for each location, but in the z direction it has nowhere to slide, so it just produces a single output (which is the sum across all channels).
Upvotes: 1