Use MS-COCO format as input to PyTorch MASKRCNN

I am trying to train a MaskRCNN Image Segmentation model with my custom dataset in MS-COCO format.

I am trying to use the polygon masks as the input but cannot get it to fit the format for my model.

My data looks like this:

{"id": 145010, 
"image_id": 101953, 
"category_id": 1040,
 
"segmentation": [[140.0, 352.5, 131.0, 351.5, 118.0, 344.5, 101.50000000000001, 323.0, 94.5, 303.0, 86.5, 292.0, 52.0, 263.5, 35.0, 255.5, 20.5, 240.0, 11.5, 214.0, 14.5, 190.0, 22.0, 179.5, 53.99999999999999, 170.5, 76.0, 158.5, 88.5, 129.0, 100.5, 111.0, 152.0, 70.5, 175.0, 65.5, 217.0, 64.5, 272.0, 48.5, 296.0, 56.49999999999999, 320.5, 82.0, 350.5, 135.0, 374.5, 163.0, 382.5, 190.0, 381.5, 205.99999999999997, 376.5, 217.0, 371.0, 221.5, 330.0, 229.50000000000003, 312.5, 240.0, 310.5, 291.0, 302.5, 310.0, 288.0, 326.5, 259.0, 337.5, 208.0, 339.5, 171.0, 349.5]], 

"area": 73578.0, 

"bbox": [11.5, 11.5, 341.0, 371.0], 

"iscrowd": 0}

I have one object in this image, hence one item for segmentation and bbox. Segmentation values are the pixels of the polygon, hence have different sizes for different objects.

Could anyone help me with this?

Upvotes: 0

Answers (1)

andre_bsb

Reputation: 41

To manage COCO formated datasets you can use this repo. It gives classes which you can instantiate from you annotation's file making it really easy to use and to access the data.

I don't know which implementation you are using, but if it's something like this tutorial, this piece of code might give you at least some ideas on how to solve your problem:

class CocoDataset(torch.utils.data.Dataset):
def __init__(self, dataset_dir, subset, transforms):
    dataset_path = os.path.join(dataset_dir, subset)
    ann_file = os.path.join(dataset_path, "annotation.json")
    self.imgs_dir = os.path.join(dataset_path, "images")
    self.coco = COCO(ann_file)
    self.img_ids = self.coco.getImgIds()
    
    self.transforms = transforms


def __getitem__(self, idx):
    '''
    Args:
        idx: index of sample to be fed
    return:
        dict containing:
        - PIL Image of shape (H, W)
        - target (dict) containing: 
            - boxes:    FloatTensor[N, 4], N being the n° of instances and it's bounding 
            boxe coordinates in [x0, y0, x1, y1] format, ranging from 0 to W and 0 to H;
            - labels:   Int64Tensor[N], class label (0 is background);
            - image_id: Int64Tensor[1], unique id for each image;
            - area:     Tensor[N], area of bbox;
            - iscrowd:  UInt8Tensor[N], True or False;
            - masks:    UInt8Tensor[N, H, W], segmantation maps;
    '''
    img_id = self.img_ids[idx]
    img_obj = self.coco.loadImgs(img_id)[0]
    anns_obj = self.coco.loadAnns(self.coco.getAnnIds(img_id)) 

    img = Image.open(os.path.join(self.imgs_dir, img_obj['file_name']))

    # list comprhenssion is too slow, might be better changing it
    bboxes = [ann['bbox'] for ann in anns_obj]
    # bboxes = ? from [x, y, w, h] to [x0, y0, x1, y1]
    masks = [self.coco.annToMask(ann) for ann in anns_obj]
    areas = [ann['area'] for ann in anns_obj]

    boxes = torch.as_tensor(bboxes, dtype=torch.float32)
    labels = torch.ones(len(anns_obj), dtype=torch.int64)
    masks = torch.as_tensor(masks, dtype=torch.uint8)
    image_id = torch.tensor([idx])
    area = torch.as_tensor(areas)
    iscrowd = torch.zeros(len(anns_obj), dtype=torch.int64)


    target = {}
    target["boxes"] = boxes
    target["labels"] = labels
    target["masks"] = masks
    target["image_id"] = image_id
    target["area"] = area
    target["iscrowd"] = iscrowd

    if self.transforms is not None:
        img, target = self.transforms(img, target)
    return img, target


def __len__(self):
    return len(self.img_ids)

Once again, this is just a draft and meant to give tips.

Upvotes: 3

Use MS-COCO format as input to PyTorch MASKRCNN

Answers (1)

Related Questions