gopalkrizna
gopalkrizna

Reputation: 826

How does the reshape work before the fully connected layer in the following CNN model?

Consider the convolutional neural network (two convolutional layers):

class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7*7*32, num_classes)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

The fully connected layer fc is to have 7*7*32 inputs coming in. The above:

out = out.reshape(out.size(0), -1) leads to a tensor with size of (32, 49). This doesn't seem right as the dimensions of input for the dense layer is different. What am I missing here?

[Note that in Pytorch the input is in the following format: [N, C, W, H] so no. of channels comes before the width and height of image]

source: https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/02-intermediate/convolutional_neural_network/main.py#L35-L56

Upvotes: 2

Views: 2090

Answers (1)

Salih Karagoz
Salih Karagoz

Reputation: 2289

If you look at the output of each layer you can easily understand what you are missing.

def forward(self, x):
    print ('input', x.size())
    out = self.layer1(x)
    print ('layer1-output', out.size())
    out = self.layer2(out)
    print ('layer2-output', out.size())
    out = out.reshape(out.size(0), -1)
    print ('reshape-output', out.size())
    out = self.fc(out)
    print ('Model-output', out.size())
    return out

test_input = torch.rand(4,1,28,28)
model(test_input)

OUTPUT:

('input', (4, 1, 28, 28))   
('layer1-output', (4, 16, 14, 14))  
('layer2-output', (4, 32, 7, 7))  
('reshape-output', (4, 1568))  
('Model-output', (4, 10))

Conv2d layer doesn't change the height and width of the tensor. only changes the channel of tensor because of stride and padding. MaxPool2d layer halves the height and width of the tensor.

inpt    = 4,1,28,28  
conv1_output = 4,16,28,28  
max_output   = 4,16,14,14  
conv2_output = 4,32,14,14  
max2_output  = 4,32,7,7  
reshapeutput = 4,1585 (32*7*7)  
fcn_output   = 4,10

N --> Input Size, F --> Filter Size, stride-> Stride Size, pdg-> Padding size

ConvTranspose2d;

OutputSize = N*stride + F - stride - pdg*2

Conv2d;

OutputSize = (N - F)/stride + 1 + pdg*2/stride [e.g. 32/3=10 it ignores after the comma]

Upvotes: 3

Related Questions