Reputation: 934
I have been studying convolution neural network architecture. I am horrendously confused on the part, where, a 3d kernel acts upon the 3d input image (well, it's 4d given we have stack of those images, but just to make explanation a bit easier). I know internet is full of stuffs like this. but i can't find exact answer to that matrix multiplication part.
To be easier for everyone to understand, Can someone show me an actual multiplication on how convolution of (5,5,3) matrix (our kernel) over (28,28,3) matrix (our RGB image ) takes place, outputting a 2d array.
Also, please also show, (with a detailed picture) , how those numerous 2d arrays gets flattened and gets connected to a single fully connected layer.
i know that, final layer of pooled 2d arrays are flattened. but, since there are like 64 2d arrays (just consider), .. so even if we flatten each one, we will have 64 flattened 1D array. so, how does this end up connecting to next fully connected layer ? (Picture please)
Upvotes: 1
Views: 817
Reputation: 131
You have multiple questions in one. I will answer the about the "how the convolution takes place". Short answer: it is not a matrix multiplication.
Step 1) You slide a window of size (5,5,3) over your RGB image carving out subimages of that size. Incidentally these subimages have exactly the same dimension as that of the kernel.
Step 2) You multiply each subimage values with the values of the convolution component wise. The output of that is again (5,5,3) subimage "scaled" by the values of the kernel.
Step 3) You add all the values of the "scaled" (5,5,3) subimage together (effectively squishing the dimensions) into a single value -- that is our final output.
Upvotes: 1