sensationti
sensationti

Reputation: 337

Output Dimensions of convolution in PyTorch

The size of my input images are 68 x 224 x 3 (HxWxC), and the first Conv2d layer is defined as

conv1 = torch.nn.Conv2d(3, 16, stride=4, kernel_size=(9,9)).

Why is the size of the output feature volume 16 x 15 x 54? I get that there are 16 filters, so there is a 16 in the front, but if I use [(W−K+2P)/S]+1 to calculate dimensions, the dimensions are not divisible.

Can someone please explain?

Upvotes: 8

Views: 16535

Answers (2)

San Askaruly
San Askaruly

Reputation: 341

Was having same kind of inconvenience estimating output size of tensor after convolutional layer. Check out a helper function I implemented at https://github.com/tuttelikz/conv_output_size.

Example:

import torch
import torch.nn as nn
from conv_output_size import conv2d_output_size

c_i, c_o = 3, 16
k, s, p = 3, 2, 1

sample_2d_tensor = torch.ones((c_i, 64, 64))
c2d = nn.Conv2d(in_channels=c_i, out_channels=c_o, kernel_size=k,
                stride=s, padding=p)

output_size = conv2d_output_size(
    sample_2d_tensor.shape, out_channels=c_o, kernel_size=k, stride=s, padding=p)

print("After conv2d")
print("Dummy input size:", sample_2d_tensor.shape)
print("Calculated output size:", output_size)
print("Real output size:", c2d(sample_2d_tensor).detach().numpy().shape")

>>> After conv2d
>>> Dummy input size: torch.Size([3, 64, 64])
>>> Calculated output size: (16, 32, 32)
>>> Real output size: (16, 32, 32)

Upvotes: 1

yakhyo
yakhyo

Reputation: 1656

The calculation of feature maps is [(W−K+2P)/S]+1 and here [] brackets means floor division. In your example padding is zero, so the calculation is [(68-9+2*0)/4]+1 ->[14.75]=14 -> [14.75]+1 = 15 and [(224-9+2*0)/4]+1 -> [53.75]=53 -> [53.75]+1 = 54.

import torch

conv1 = torch.nn.Conv2d(3, 16, stride=4, kernel_size=(9,9))
input = torch.rand(1, 3, 68, 224)

print(conv1(input).shape)
# torch.Size([1, 16, 15, 54])

You may see different formulas too calculate feature maps.

In PyTorch: enter image description here

In general, you may see this:

enter image description here

However the result of both cases are the same

Upvotes: 11

Related Questions