xiaofei
xiaofei

Reputation: 107

Why the LeNet5 uses 32×32 image as input?

I know that the handwritten digit images in the mnist dataset are 28×28,but why the input in LeNet5 is 32×32?

Upvotes: 8

Views: 3159

Answers (1)

runDOSrun
runDOSrun

Reputation: 10985

Your question is answered in the original paper:
The convolution step always takes a smaller input than the feature maps of the previous layer (and this holds true for the 1st layer - the input - as well):

Layer C1 is a convolutional layer with 6 feature maps. Each unit in each feature map is connected to a 5x5 neighborhood in the input. The size of the feature maps is 28x28 which prevents connection from the input from falling off the boundary.

This means that using a 5x5 neighborhood on a 32x32 input, you'll get 6 features maps of size 28x28 because there's pixels you won't use at the image boundary (you will always have a remainder with these numbers).

Of course they could have an exception for the first layer. The reason they're still using 32x32 images is:

The input is a 32x32 pixel image. This is significantly larger than the largest character in the database (at most 20x20 pixels centered in a 28x28 field). The reason is that it is desirable that potential distinctive features such as stroke end-points or corner can appear in the center of the receptive field of the highest-level feature detectors.

Upvotes: 3

Related Questions