I am going through tutorials to train/test a convolutional neural network(CNN), and I am having an issue with prepping a test image to run it through the trained network. My initial guess is that it has something to do with having a correct format of the tensor input for the net.
Here is the code for the Net.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.nn.init as I
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
## 1. This network takes in a square (same width and height), grayscale image as input
## 2. It ends with a linear layer that represents the keypoints
## this last layer output 136 values, 2 for each of the 68 keypoint (x, y) pairs
# input size 224 x 224
# after the first conv layer, (W-F)/S + 1 = (224-5)/1 + 1 = 220
# after one pool layer, this becomes (32, 110, 110)
self.conv1 = nn.Conv2d(1, 32, 5)
# maxpool layer
# pool with kernel_size = 2, stride = 2
self.pool = nn.MaxPool2d(2,2)
# second conv layer: 32 inputs, 64 outputs , 3x3 conv
## output size = (W-F)/S + 1 = (110-3)/1 + 1 = 108
## output dimension: (64, 108, 108)
## after another pool layer, this becomes (64, 54, 54)
self.conv2 = nn.Conv2d(32, 64, 3)
# third conv layer: 64 inputs, 128 outputs , 3x3 conv
## output size = (W-F)/S + 1 = (54-3)/1 + 1 = 52
## output dimension: (128, 52, 52)
## after another pool layer, this becomes (128, 26, 26)
self.conv3 = nn.Conv2d(64,128,3)
self.conv_drop = nn.Dropout(p = 0.2)
self.fc_drop = nn.Dropout(p = 0.4)
# 64 outputs * 5x5 filtered/pooled map = 186624
self.fc1 = nn.Linear(128*26*26, 1000)
self.fc2 = nn.Linear(1000, 1000)
self.fc3 = nn.Linear(1000, 136)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = self.conv_drop(x)
# prep for linear layer
# flattening
x = x.view(x.size(0), -1)
# two linear layers with dropout in between
x = F.relu(self.fc1(x))
x = self.fc_drop(x)
x = self.fc2(x)
x = self.fc_drop(x)
x = self.fc3(x)
return x
Maybe my calculation in layer inputs is wrong?
And here is the test-running code block: (you can think of 'roi' as a standard numpy image.)
# loop over the detected faces from your haar cascade
for i, (x,y,w,h) in enumerate(faces):
ax = plt.subplot(1, len(faces), i+1)
# Select the region of interest that is the face in the image
roi = image_copy[y:y+h, x:x+w]
## TODO: Convert the face region from RGB to grayscale
roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
## TODO: Normalize the grayscale image so that its color range falls in [0,1] instead of [0,255]
roi = np.multiply(roi, 1/255)
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (244,244))
roi = roi.reshape(roi.shape[0], roi.shape[1], 1)
roi = roi.transpose((2, 0, 1))
## TODO: Change to tensor
roi = torch.from_numpy(roi)
roi = roi.type(torch.FloatTensor)
roi = roi.unsqueeze(0)
print (roi.shape)
## TODO: run it through the net
output_pts = net(roi)
And I get the error message saying:
RuntimeError: size mismatch, m1: [1 x 100352], m2: [86528 x 1000] at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/TH/generic/THTensorMath.c:2033
The caveat is if I run my trained network in the provided test suite (where the tensor inputs are already prepped), it gives no errors there and runs as it is supposed to. I think that means there's nothing wrong with the design of the network architecture itself. I think there's something wrong with the way that I am prepping the image.
The output of the 'roi.shape' is:
torch.Size([1, 1, 244, 244])
Which should be okay because ([batch_size, color_channel, x, y]).
UPDATE: I have printed out the shape of the layers during running through the net. It turns out the matching input dimensions for FC are different for the test image for the task and the given test images from the test suite. Then I'm almost 80% sure that my prepping the input image for the net is wrong. But how can they have different matching dimensions if the input tensor for both has the exact same dimension ([1,1,244,244])?
when using the provided test suite (where it runs fine):
input: torch.Size([1, 1, 224, 224])
layer before 1st CV: torch.Size([1, 1, 224, 224])
layer after 1st CV pool: torch.Size([1, 32, 110, 110])
layer after 2nd CV pool: torch.Size([1, 64, 54, 54])
layer after 3rd CV pool: torch.Size([1, 128, 26, 26])
flattend layer for the 1st FC: torch.Size([1, 86528])
When prepping/running the test image:
input: torch.Size([1, 1, 244, 244])
layer before 1st CV: torch.Size([1, 1, 244, 244])
layer after 1st CV pool: torch.Size([1, 32, 120, 120]) #<- what happened here??
layer after 2nd CV pool: torch.Size([1, 64, 59, 59])
layer after 3rd CV pool: torch.Size([1, 128, 28, 28])
flattend layer for the 1st FC: torch.Size([1, 100352])
Did you noticed you have this line in the image preparation.
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (244,244))
so you just resized it to 244x244 and not to 224x224.
