Reputation: 668
Given a pre-trained ResNet152, in trying to calculate predictions bench-marks using some common datasets (using PyTorch), and the first RGB dataset that came to mind was CIFAR10. The thing is that CIFAR10 data is 3x32x32
and ResNet expects 3x224x224
. I've resized the data using the known approach of transforms
:
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
train = datasets.CIFAR10(root='./data', train=True, download=True, transform=preprocess)
test = datasets.CIFAR10(root='./data', train=False, download=True, transform=preprocess)
train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size)
test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size)
but this results in blurry samples and bad predictions. I was wondering what are the best approaches in those cases, as I see many papers using those datasets given advanced models like ResNes and VGG, and I'm not sure how this technical issue could be resolved.
thank you for your response!
Upvotes: 0
Views: 1813
Reputation: 11
For image classification benchmarks, I recommend you to use the commonly used datasets (also pre-defined inside torch vision) with higher resolution: LSUN or Places365.
Upvotes: 0
Reputation: 757
Yes, you need to resize input images to the size 3x224x224
. By doing so, after a normal training procedure, you should achieve outstanding results on CIFAR-10 (like 96% on the test-set).
I guess the main problem is that you're using a network that is pre-trained on higher resolution images (resnet152 comes pre-trained on imageNet), without any other training you can't expect good results changing the dataset drastically.
Upvotes: 1