Dimensions of a convolution?

Question

I have some questions regarding how this convolution is calculated and its output dimension. I'm familiar with simple convolutions with a nxm kernel, using strides, dilations or padding, thats not a problem, but this dimensions seems odd to me. Since the model that I'm using is pretty well known onnx-mnist, I assume it is correct.

So, my point is:

If the input has a dimensions of 1x1x28x28, how is the output 1x8x28x28?
W denotes the kernel. How can it be 8x1x5x5? As far as I know, the first dimension is the batch size, but here I'm just doing inference with 1 input. Does this make sense?
I'm implementing from scratch this convolution operator, and so far it works for 1x1x28x28 and a kernel of 1x1x5x5, but that extra dimensions doesn't make sense to me.

Find attached the convolution that I'm trying to do, hope is not too onnx specific.

mrzo · Accepted Answer

I do not see the code you are using but I guess 8 is the number of kernels. This means you apply 8 different kernels on your input with the size 5x5 over a batch size of 1. That is how you get 1x8x28x28 in the output, the 8 denotes the number of activation maps (one for each kernel).

The numbers of your kernel dimensions (8x1x5x5) explained:

8: Number of different filters/kernels (will be number of output maps per image)
1: Number of input channels. If your input image was RGB instead of grayscale, this would be 3 instead of 1.
5: First spatial dimension
5: Second spatial dimension

Dimensions of a convolution?

Answers (1)

Related Questions