Prepare for Binary Masks used for the image segmentation

Question

I am trying to prepare the masks for image segmentation with Pytorch. I have three questions about data preparation.

What is the appropriate data format to save the binary mask in general? PNG? JPEG?
Is the mask size needed to be set square such as (224x224), not a rectangle such as (224x448)?
Is the mask value fixed when the size is converted from rectangle to square?

For example, the original mask image size is (600x900), which is binary [0,1]. However, when I applied

import torchvision.transforms as transforms
transforms.Compose([
                        transforms.Resize((300, 300)),
                        transforms.ToTensor(),
                        ])

to the mask, the output had other values: 0.01, 0.0156, 0.22... except for 0 and 1, since the mask size was converted.

I applied the below code to convert the mask into the binary again if the value is less than 0.3, the value is 0, otherwise, 1.

def __getitem__(self, idx):
    img, mask = self.load_data(idx)
    if self.img_transforms is not None:
        img = self.img_transforms(img)
    if self.mask_transforms is not None:
        mask = self.mask_transforms(mask)
        mask = torch.where(mask<=0.3,0,1)
    return img, mask

but I wonder the process is a common approach and efficient.

Alexey Birukov · Accepted Answer

PNG, because it is lossless by design.
It depends. More convenient is to use standard resolution, (224x224), I would start with that.
Use resize without interpolation transforms.Resize((300, 300), interpolation=InterpolationMode.NEAREST)

Prepare for Binary Masks used for the image segmentation

Answers (1)

Related Questions