How exactly does matrix multiplication of 3d kernel and 3d image ( Say RGB) takes place to give 2d output?

Question

I have been studying convolution neural network architecture. I am horrendously confused on the part, where, a 3d kernel acts upon the 3d input image (well, it's 4d given we have stack of those images, but just to make explanation a bit easier). I know internet is full of stuffs like this. but i can't find exact answer to that matrix multiplication part.

To be easier for everyone to understand, Can someone show me an actual multiplication on how convolution of (5,5,3) matrix (our kernel) over (28,28,3) matrix (our RGB image ) takes place, outputting a 2d array.

Also, please also show, (with a detailed picture) , how those numerous 2d arrays gets flattened and gets connected to a single fully connected layer.

i know that, final layer of pooled 2d arrays are flattened. but, since there are like 64 2d arrays (just consider), .. so even if we flatten each one, we will have 64 flattened 1D array. so, how does this end up connecting to next fully connected layer ? (Picture please)

How exactly does matrix multiplication of 3d kernel and 3d image ( Say RGB) takes place to give 2d output?

Answers (1)

Related Questions