Reputation: 1
I am trying to train a MaskRCNN Image Segmentation model with my custom dataset in MS-COCO format.
I am trying to use the polygon masks as the input but cannot get it to fit the format for my model.
My data looks like this:
{"id": 145010,
"image_id": 101953,
"category_id": 1040,
"segmentation": [[140.0, 352.5, 131.0, 351.5, 118.0, 344.5, 101.50000000000001, 323.0, 94.5, 303.0, 86.5, 292.0, 52.0, 263.5, 35.0, 255.5, 20.5, 240.0, 11.5, 214.0, 14.5, 190.0, 22.0, 179.5, 53.99999999999999, 170.5, 76.0, 158.5, 88.5, 129.0, 100.5, 111.0, 152.0, 70.5, 175.0, 65.5, 217.0, 64.5, 272.0, 48.5, 296.0, 56.49999999999999, 320.5, 82.0, 350.5, 135.0, 374.5, 163.0, 382.5, 190.0, 381.5, 205.99999999999997, 376.5, 217.0, 371.0, 221.5, 330.0, 229.50000000000003, 312.5, 240.0, 310.5, 291.0, 302.5, 310.0, 288.0, 326.5, 259.0, 337.5, 208.0, 339.5, 171.0, 349.5]],
"area": 73578.0,
"bbox": [11.5, 11.5, 341.0, 371.0],
"iscrowd": 0}
I have one object in this image, hence one item for segmentation and bbox. Segmentation values are the pixels of the polygon, hence have different sizes for different objects.
Could anyone help me with this?
Upvotes: 0
Views: 2213
Reputation: 41
To manage COCO formated datasets you can use this repo. It gives classes which you can instantiate from you annotation's file making it really easy to use and to access the data.
I don't know which implementation you are using, but if it's something like this tutorial, this piece of code might give you at least some ideas on how to solve your problem:
class CocoDataset(torch.utils.data.Dataset):
def __init__(self, dataset_dir, subset, transforms):
dataset_path = os.path.join(dataset_dir, subset)
ann_file = os.path.join(dataset_path, "annotation.json")
self.imgs_dir = os.path.join(dataset_path, "images")
self.coco = COCO(ann_file)
self.img_ids = self.coco.getImgIds()
self.transforms = transforms
def __getitem__(self, idx):
'''
Args:
idx: index of sample to be fed
return:
dict containing:
- PIL Image of shape (H, W)
- target (dict) containing:
- boxes: FloatTensor[N, 4], N being the n° of instances and it's bounding
boxe coordinates in [x0, y0, x1, y1] format, ranging from 0 to W and 0 to H;
- labels: Int64Tensor[N], class label (0 is background);
- image_id: Int64Tensor[1], unique id for each image;
- area: Tensor[N], area of bbox;
- iscrowd: UInt8Tensor[N], True or False;
- masks: UInt8Tensor[N, H, W], segmantation maps;
'''
img_id = self.img_ids[idx]
img_obj = self.coco.loadImgs(img_id)[0]
anns_obj = self.coco.loadAnns(self.coco.getAnnIds(img_id))
img = Image.open(os.path.join(self.imgs_dir, img_obj['file_name']))
# list comprhenssion is too slow, might be better changing it
bboxes = [ann['bbox'] for ann in anns_obj]
# bboxes = ? from [x, y, w, h] to [x0, y0, x1, y1]
masks = [self.coco.annToMask(ann) for ann in anns_obj]
areas = [ann['area'] for ann in anns_obj]
boxes = torch.as_tensor(bboxes, dtype=torch.float32)
labels = torch.ones(len(anns_obj), dtype=torch.int64)
masks = torch.as_tensor(masks, dtype=torch.uint8)
image_id = torch.tensor([idx])
area = torch.as_tensor(areas)
iscrowd = torch.zeros(len(anns_obj), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.img_ids)
Once again, this is just a draft and meant to give tips.
Upvotes: 3