Thanh Le Hai Hoang
Thanh Le Hai Hoang

Reputation: 1403

How to calculate the output size of a convoluitonal layer in YOLO?

YOLO Architecture

This is the architecture of YOLO. I am trying to calculate the output size of each layer myself, but I can't get the size as described in the paper.

For example, in the first Conv Layer, the input size is 448x448 but it uses a 7x7 filter with stride 2, but according to this equation W2=(W1−F+2P)/S+1 = (448 - 7 + 0)/2 + 1, I can't get an integer result, so the filter size seems to be unsuitable to the input size.

So anyone can explain this problem? Did I miss something or misunderstand the YOLO architecture?

Upvotes: 2

Views: 1727

Answers (2)

gl chen
gl chen

Reputation: 36

As Hawx Won said, the input image has been added extra 3 paddings, and here is how it works from the source code.


For convolution layers, if pad is enabled, The padding value of each layer will be calculated by:

# In parser.c
if(pad) padding = size/2;

# In convolutional_layer.c
l.pad = padding;

Where size is the shape of the filter.


So, for the first layer: padding = size/2 = 7/2=3

Then the output of first convolutional layer should be:

output_w = (input_w+2*pad-size)/stride+1 = (448+6-7)/2+1 = 224

output_h = (input_h+2*pad-size)/stride+1 = (448+6-7)/2+1 = 224

Upvotes: 2

Hawx Won
Hawx Won

Reputation: 21

Well, I spent some time learning the source code, and learned about that the input image has added extra 3 paddings on top,down,left and right side of the image, so the image size becomes (448+2x3)=454, the out put size of valid padding should be calculated in this way: Output_size=ceil((W-F+1)/S)=(454-7+1)/2=224, therefore, output size should be 224x224x64

I hope this could be helpful

Upvotes: 2

Related Questions