Resnet18 first layer output dimensions

Question

I am looking at the model implementation in PyTorch. The 1st layer is a convolutional layer with filter size = 7, stride = 2, pad = 3. The standard input size to the network is 224x224x3. Based on these numbers, the output dimensions are (224 + 3*2 - 7)/2 + 1, which is not an integer. Does the original implementation contain non-integer dimensions? I see that the network has adaptive pooling before the FC layer, so the variable input dimensions aren't a problem (I tested this by varying the input size). Am I doing something wrong, or why would the authors choose a non-integer dimension while designing the ResNet?

Michael Jungo · Accepted Answer

The dimensions always have to be integers. From nn.Conv2d - Shape:

The brackets that are only closed towards the bottom denote the floor operation (round down). The calculation becomes:

import math

math.floor((224 + 3*2 - 7)/2 + 1) # => 112

# Or using the integer division (two slashes //)
(224 + 3*2 - 7) // 2 + 1 # => 112

Using an integer division has the same effect, since that always rounds it down to the nearest integer.

Resnet18 first layer output dimensions

Answers (1)

Related Questions