Jitesh Malipeddi
Jitesh Malipeddi

Reputation: 2385

Pytorch Runtime Error - The size of tensor a (5) must match the size of tensor b (3) at non-singleton dimension

I am trying to train a Faster RCNN Network on a custom dataset consisting of images for object detection. However, I don't want to directly give an RGB image as input, I actually need to pass it through another network (a feature extractor) along with the corresponding thermal image and give the extracted features as the input to the FRCNN Network. The feature extractor combines these two images into a 4 channel tensor and the output is a 5 channel tensor. It is this 5 channel tensor that I wish to give as input to the Faster RCNN Network.

I followed the PyTorch docs for Object Detection Finetuning (link here) and came up with the following code to suit my dataset.

class CustomDataset(torch.utils.data.Dataset):

    def __getitem__(self, idx):
        self.num_classes = 5
        img_rgb_path = os.path.join(self.root, "rgb/", self.rgb_imgs[idx])
        img_thermal_path = os.path.join(self.root, "thermal/", self.thermal_imgs[idx])


        img_rgb = Image.open(img_rgb_path)
        img_rgb = np.array(img_rgb)
        x_rgb = TF.to_tensor(img_rgb)
        x_rgb.unsqueeze_(0)

        img_thermal = Image.open(img_thermal_path)
        img_thermal = np.array(img_thermal)
        img_thermal = np.expand_dims(img_thermal,-1)
        x_th = TF.to_tensor(img_thermal)
        x_th.unsqueeze_(0)       

        print(x_rgb.shape)  # shape of [3,640,512]
        print(x_th.shape) # shape of [1,640,512]

        input = torch.cat((x_rgb,x_th),dim=1) # shape of [4,640,512]


        img = self.feature_extractor(input) #  My custom feature extractor which returns a 5 dimensional tensor

        print(img.shape) # shape of [5,640,512]



        filename = os.path.join(self.root,'annotations',self.annotations[idx])
        tree = ET.parse(filename)
        objs = tree.findall('object')

        num_objs = len(objs)
        boxes = np.zeros((num_objs, 4), dtype=np.uint16)
        labels = np.zeros((num_objs), dtype=np.float32)
        seg_areas = np.zeros((num_objs), dtype=np.float32)

        boxes = []
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            x1 = float(bbox.find('xmin').text)
            y1 = float(bbox.find('ymin').text)
            x2 = float(bbox.find('xmax').text)
            y2 = float(bbox.find('ymax').text)

            cls = self._class_to_ind[obj.find('name').text.lower().strip()]
            boxes.append([x1, y1, x2, y2])
            labels[ix] = cls
            seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1)

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        seg_areas = torch.as_tensor(seg_areas, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.float32)

        target =  {'boxes': boxes,
                'labels': labels,
                'seg_areas': seg_areas,
                }

        return img,target

My main function code is as follows

import utils


def train_model(model, criterion,dataloader,num_epochs):
    since = time.time()

    best_model = model
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)


        lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.1)

        # optimizer = lr_scheduler(optimizer, epoch)
        model.train()  # Set model to training mode

        running_loss = 0.0
        running_corrects = 0

        for data in dataloader:
            inputs, labels = data[0][0], data[1]

            inputs = inputs.to(device) 
            # zero the parameter gradients

            optimizer.zero_grad()

            # forward
            outputs = model(inputs, labels)
            _, preds = torch.max(outputs.data, 1)
            loss = criterion(outputs, labels)

            loss.backward()
            optimizer.step()


            running_loss += loss.item()
            running_corrects += torch.sum(preds == labels).item()

        epoch_loss = running_loss / len(dataloader)
        epoch_acc = running_corrects / len(dataloader)

        print('{} Loss: {:.4f} Acc: {:.4f}'.format(
            phase, epoch_loss, epoch_acc))

backbone = torchvision.models.mobilenet_v2(pretrained=True).features
backbone.out_channels = 1280

anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
                                                output_size=7,
                                                sampling_ratio=2)
num_classes = 5

model = FasterRCNN(backbone = backbone,num_classes=5,rpn_anchor_generator=anchor_generator,box_roi_pool=roi_pooler)

dataset = CustomDataset('train_folder/')
data_loader_train = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True,collate_fn=utils.collate_fn)

train_model(model, criterion, data_loader_train, num_epochs=10)

The collate_fn defined in the utils.py file is the following

def collate_fn(batch):
    return tuple(zip(*batch))

I, however, get the following error while training

Traceback (most recent call last):
  File "train.py", line 147, in <module>
    train_model(model, criterion, data_loader_train, num_epochs)
  File "train.py", line 58, in train_model
    outputs = model(inputs, labels)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/generalized_rcnn.py", line 66, in forward
    images, targets = self.transform(images, targets)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/transform.py", line 46, in forward
    image = self.normalize(image)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/transform.py", line 66, in normalize
    return (image - mean[:, None, None]) / std[:, None, None]
RuntimeError: The size of tensor a (5) must match the size of tensor b (3) at non-singleton dimension 0

I am a newbie in Pytorch.

Upvotes: 0

Views: 6762

Answers (1)

dumbPy
dumbPy

Reputation: 1518

The backbone network you are using for the FasterRCNN is a pretrained mobilenet_v2. The input channel of a network is decided by the number of channels of the input data. Since the (backbone) model is pretrained (on natural images?) with 3 channels 3xNxM, you cannot use it for tensors of dimension 5xPxQ (skipping the singleton <batch_size> dimension).

Basically, you have 2 options,
1. Reduce the output channel dimension of the 1st network to 3 (better if you are training it from scratch)
2. Make a new backbone for the FasterRCNN with 5 channels in input and train it from scratch.

As for explaining the error message,

return (image - mean[:, None, None]) / std[:, None, None]

Pytorch is trying to normalize the input image where your input image has dimension (5,M,N) and teh tensors mean and std have 3 channels instead of 5

Upvotes: 1

Related Questions