Reputation: 4180
I am trying to implement a transfer learning approach in PyTorch. This is the dataset that I am using: Dog-Breed
Here's the step that I am following.
1. Load the data and read csv using pandas.
2. Resize (60, 60) the train images and store them as numpy array.
3. Apply stratification and split the train data into 7:1:2 (train:validation:test)
4. use the resnet18 model and train.
Location of dataset
LABELS_LOCATION = './dataset/labels.csv'
TRAIN_LOCATION = './dataset/train/'
TEST_LOCATION = './dataset/test/'
ROOT_PATH = './dataset/'
Reading CSV (labels.csv)
def read_csv(csvf):
# print(pandas.read_csv(csvf).values)
data=pandas.read_csv(csvf).values
labels_dict = dict(data)
idz=list(labels_dict.keys())
clazz=list(labels_dict.values())
return labels_dict,idz,clazz
I did this because of a constraint which I will mention next when I am loading the data using DataLoader.
def class_hashmap(class_arr):
uniq_clazz = Counter(class_arr)
class_dict = {}
for i, j in enumerate(uniq_clazz):
class_dict[j] = i
return class_dict
labels, ids, class_names = read_csv(LABELS_LOCATION)
train_images = os.listdir(TRAIN_LOCATION)
class_numbers = class_hashmap(class_names)
Next, I resize the image to 60,60 using opencv
, and store the result as numpy array.
resize = []
indexed_labels = []
for t_i in train_images:
# resize.append(transform.resize(io.imread(TRAIN_LOCATION+t_i), (60, 60, 3))) # (60,60) is the height and widht; 3 is the number of channels
resize.append(cv2.resize(cv2.imread(TRAIN_LOCATION+t_i), (60, 60)).reshape(3, 60, 60))
indexed_labels.append(class_numbers[labels[t_i.split('.')[0]]])
resize = np.asarray(resize)
print(resize.shape)
Here in indexed_labels, I give each label a number.
Next, I split the data into 7:1:2 part
X = resize # numpy array of images [training data]
y = np.array(indexed_labels) # indexed labels for images [training labels]
sss = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=0)
sss.get_n_splits(X, y)
for train_index, test_index in sss.split(X, y):
X_temp, X_test = X[train_index], X[test_index] # split train into train and test [data]
y_temp, y_test = y[train_index], y[test_index] # labels
sss = StratifiedShuffleSplit(n_splits=3, test_size=0.123, random_state=0)
sss.get_n_splits(X_temp, y_temp)
for train_index, test_index in sss.split(X_temp, y_temp):
print("TRAIN:", train_index, "VAL:", test_index)
X_train, X_val = X[train_index], X[test_index] # training and validation data
y_train, y_val = y[train_index], y[test_index] # training and validation labels
Next, I loaded the data from the previous step into torch DataLoaders
batch_size = 500
learning_rate = 0.001
train = torch.utils.data.TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))
train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=False)
val = torch.utils.data.TensorDataset(torch.from_numpy(X_val), torch.from_numpy(y_val))
val_loader = torch.utils.data.DataLoader(val, batch_size=batch_size, shuffle=False)
test = torch.utils.data.TensorDataset(torch.from_numpy(X_test), torch.from_numpy(y_test))
test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=False)
# print(train_loader.size)
dataloaders = {
'train': train_loader,
'val': val_loader
}
Next, I load the pretrained rensnet model.
model_ft = models.resnet18(pretrained=True)
# freeze all model parameters
# for param in model_ft.parameters():
# param.requires_grad = False
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, len(class_numbers))
if use_gpu:
model_ft = model_ft.cuda()
model_ft.fc = model_ft.fc.cuda()
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.fc.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=25)
And then I use the train_model, a method described here in PyTorch's docs.
However, when I run this I get an error.
Traceback (most recent call last):
File "/Users/nirvair/Sites/pyTorch/TL.py",
line 244, in <module>
num_epochs=25)
File "/Users/nirvair/Sites/pyTorch/TL.py", line 176, in train_model
outputs = model(inputs)
File "/Library/Python/2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/Library/Python/2.7/site-packages/torchvision/models/resnet.py", line 149, in forward
x = self.avgpool(x)
File "/Library/Python/2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/Library/Python/2.7/site-packages/torch/nn/modules/pooling.py", line 505, in forward
self.padding, self.ceil_mode, self.count_include_pad)
File "/Library/Python/2.7/site-packages/torch/nn/functional.py", line 264, in avg_pool2d
ceil_mode, count_include_pad)
File "/Library/Python/2.7/site-packages/torch/nn/_functions/thnn/pooling.py", line 360, in forward
ctx.ceil_mode, ctx.count_include_pad)
RuntimeError: Given input size: (512x2x2). Calculated output size: (512x0x0). Output size is too small at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/THNN/generic/SpatialAveragePooling.c:64
I can't seem to figure out what's going wrong here.
Upvotes: 2
Views: 6937
Reputation: 1
I would just add to @Mo Hossny answer that the input shape is not required to be
224x224 (at least).
Actually, it should be at least 33x33. But 224x224 is the recommended shape as the network was initially trained on such input shape. The right thing @Mo Hossny has stated is the dimensionality reduction nature of CNN networks as ResNet18. However, due to the AdaptiveAvgPool2d layer ("adaptive" term allows to pass images larger and smaller than reccomended shape), obtained feature maps just before this layer are teoretically as follows:
Input shape feature maps(input to AdaptiveAvgPool2d)
3x32x32 --> batch_size x 512 x (cannot reduce dimensionality more)
3x33x33 --> batch_size x 512 x 2x2
3x64x64 --> batch_size x 512 x 2x2
3x128x128 --> batch_size x 512 x 4x4
3x224x224 --> batch_size x 512 x 7x7
3x256x256 --> batch_size x 512 x 8x8
Experimenting with resnet18 as a model:
model(torch.rand(1, 3, 33, 33))
works, where
model(torch.rand(1, 3, 32, 32))
results in ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
Therefore, your size ought not to be the problem in that case.
Upvotes: 0
Reputation: 742
Your network is too deep for the size of images you are using (60x60). As you know, the CNN layers do produce smaller and smaller feature maps as the input image propagate through the layers. This is because you are not using padding.
The error you have simply says that the next layer is expecting 512 feature maps with a size of 2 pixels by 2 pixels. The actual feature map produced from the forward pass was 512 maps of size 0x0. This mismatch is what triggered the error.
Generally, all stock networks, such as RESNET-18, Inception, etc, require the input images to be of the size 224x224 (at least). You can do this easier using the torchvision transforms
[1]. You can also use larger image sizes with one exception for the AlexNet which has the size of feature vector hardcoded as explained in my answer in [2].
Bonus Tip: If you are using the network in pre-tained mode, you will need to whiten the data using the parameters in the pytorch documentation at [3].
Links
Upvotes: 5