kokonut
kokonut

Reputation: 25

PyTorch and Convolutional Neural Networks

I have a image input 340px*340px and I want to classify it to 2 classes. I want to create convolution neural network (PyTorch framework). I have a problem with input and output of layer.

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 3 channels (RGB), kernel=5, but i don't understand why 6. 
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        #why 16?
        self.conv2 = nn.Conv2d(6, 16, 5)
        #why 107584 = 328*328
        self.fc1 = nn.Linear(107584, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 2)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        # i dont understand this line
        x = x.view(x.size(0),  -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Is it network is correct?

Upvotes: 0

Views: 844

Answers (2)

Ashwath S
Ashwath S

Reputation: 79

You haven't explained what your exact issue is, but I will try and answer the questions in the comments:

  1. Why 6: 6 is the number of output channels you wish to obtain in the output of your convolutional operation. As you go through the convolutional layers, you would increase the number of channels while you decrease the input size; this is done to increase the receptive field as we go deeper, leading to a hierarchy where lower level features - like edges, shapes, etc. - are detected in the initial layers, while higher level features are detected in the layers nearing the end.

  2. Why 16: Again, you want to increase your channels as you go further down so that you have more features that are learnt.

  3. why 107584: The logic of the model you are using is that a normal feed forward net takes control after the initial 2 layers of convolution. The output size after the first pooling and convolutional operation is 169X169X6. This becomes 82X82X16 after the second convolution and pooling -- 6 and 16 are the number of channels and 169X169 and 82X82 are the input sizes. So, when you flatten the tensor using the .view function, you get the following size: 82X82X16 = 107584, which corresponds to the input size given to the fc1 layer. So the view function basically flattens the tensor.

Hope this helps!

Upvotes: 0

nicobonne
nicobonne

Reputation: 688

# 3 channels (RGB), kernel=5, but i don't understand why 6.

The second parameter of Conv2d is out_channels. In a convolutional layer you can arbitrarily define a number of out channels. So it's set to 6 because someone set it to 6.

# why 16?

Same as above.

#why 107584 = 328*328

and

\ # i dont understand this line

Tensor.view() returns a new tensor with the same data as the self tensor but of a different size. x = x.view(x.size(0), -1): -1 means "infer from other dimensions" so, you are forcing the Tensor to be [1, 15*164*164] => [1, 403440].

403440 is also the correct value for self.fc1 = nn.Linear(107584, 120), instead of 107584.

Upvotes: 1

Related Questions