Reputation: 826
Consider the convolutional neural network (two convolutional layers):
class ConvNet(nn.Module):
def __init__(self, num_classes=10):
super(ConvNet, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
nn.BatchNorm2d(16),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2))
self.layer2 = nn.Sequential(
nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2))
self.fc = nn.Linear(7*7*32, num_classes)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = out.reshape(out.size(0), -1)
out = self.fc(out)
return out
The fully connected layer fc
is to have 7*7*32
inputs coming in. The above:
out = out.reshape(out.size(0), -1)
leads to a tensor with size of (32, 49)
.
This doesn't seem right as the dimensions of input for the dense layer is different. What am I missing here?
[Note that in Pytorch the input is in the following format: [N, C, W, H] so no. of channels comes before the width and height of image]
Upvotes: 2
Views: 2090
Reputation: 2289
If you look at the output of each layer you can easily understand what you are missing.
def forward(self, x):
print ('input', x.size())
out = self.layer1(x)
print ('layer1-output', out.size())
out = self.layer2(out)
print ('layer2-output', out.size())
out = out.reshape(out.size(0), -1)
print ('reshape-output', out.size())
out = self.fc(out)
print ('Model-output', out.size())
return out
test_input = torch.rand(4,1,28,28)
model(test_input)
OUTPUT:
('input', (4, 1, 28, 28))
('layer1-output', (4, 16, 14, 14))
('layer2-output', (4, 32, 7, 7))
('reshape-output', (4, 1568))
('Model-output', (4, 10))
Conv2d layer doesn't change the height and width of the tensor. only changes the channel of tensor because of stride and padding. MaxPool2d layer halves the height and width of the tensor.
inpt = 4,1,28,28
conv1_output = 4,16,28,28
max_output = 4,16,14,14
conv2_output = 4,32,14,14
max2_output = 4,32,7,7
reshapeutput = 4,1585 (32*7*7)
fcn_output = 4,10
N --> Input Size, F --> Filter Size, stride-> Stride Size, pdg-> Padding size
ConvTranspose2d;
OutputSize = N*stride + F - stride - pdg*2
Conv2d;
OutputSize = (N - F)/stride + 1 + pdg*2/stride [e.g. 32/3=10 it ignores after the comma]
Upvotes: 3