tnbx_ve
tnbx_ve

Reputation: 21

ResNet doesn't train because of differences in images' sizes

So ho I have 30 folders with images inside them, and I wanted to train ResNet50 on them. I created a CustomDataset, and inside I put a Resize(224, 224) so that every image had the same size.

Here's what I did:

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform = None):
        self.img_labels = pd.read_csv(annotations_file, sep= ';')
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0][0:9] + '.tar', self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        transf = transforms.Resize((224, 224))
        image = transf(image)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

The dataset works, however, when the network is trying to create the batch, at entry 271 (which I don't know how to plot to see the image) it raises this error:

Traceback (most recent call last):
  File "final.py", line 235, in <module>
    num_epochs = epochs, is_inception=(model_name== 'inception'))
  File "final.py", line 111, in train_model
    for input, labels in dataloaders[phase]:
  File "/home/fdalligna/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "/home/fdalligna/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/fdalligna/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/home/fdalligna/.local/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 172, in default_collate
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "/home/fdalligna/.local/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 172, in <listcomp>
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "/home/fdalligna/.local/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 138, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 224, 224] at entry 0 and [1, 224, 224] at entry 271

Does anyone by chance know how can I make all images of size [3, 224, 224]?

Thank you :)

Upvotes: 0

Views: 866

Answers (1)

Jake Tae
Jake Tae

Reputation: 1741

As noted in the comments, the error suggests that your dataset contains both gray scale and RGB (color) images. Although all images have indeed been resized to 224 pixels, color images have 3 channels, whereas gray scale images only have a single channel, so a batch cannot be created.

If you insist on training a network on this mixed dataset, you can either

  • Turn color images into gray scale
  • Modify gray scale images to have 3 channels to mimic RGB

From training a neural network point of view, the first option makes more sense. This can be achieved by averaging a color image across the RGB channel.

    def __getitem__(self, idx):
        # copied
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0][0:9] + '.tar', self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        transf = transforms.Resize((224, 224))
        image = transf(image)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)

        # check if color image
        if image.size(0) == 3:
            # average across channels
            image = image.mean(dim=0).unsqueeze(0)
        return image, label

You should make sure that the input layer of the network expects mono-channel images.

If you want to choose option 2 instead, you can instead do

        # check if gray scale image
        if image.size(0) == 1:
            # repeat color channel
            image = image.repeat(3, 1, 1)
        return image, label

In this case, the model would expect three input channels, i.e., nn.Conv2d(in_channels=3, out_channels, kernel_size, ...).

Upvotes: 1

Related Questions