Dang Manh Truong
Dang Manh Truong

Reputation: 656

How to resize image segmentation mask?

I have a binary mask for each image, with each mask pixel having value of either 0 or 255. Now because my image segmentation needs images of fixed size, I will have to resize the images and the masks. However, when I resize the image, then there will be places in the mask where the value is more than 0 but smaller than 255. How do I know which one to keep ? Because the library demands that there be only 0 or 255. Please help me, thank you very much.

Upvotes: 6

Views: 6388

Answers (2)

Shai
Shai

Reputation: 114866

If you resize using interpolation other than nearest-neighbor, you indeed get values in the range of [0, 255]. This is not necessarily a bad thing. If your loss function is cross-entropy, you can think of these values as "soft labels". That is, this pixel does not have a "hard" assignment to any of the targets, but rather a "soft", probabilistic, assignment to both.

Generalizing this for multi-label segmentation masks, and to more complex geometric augmentations (e.g., rotations, affine, ...). The question is
How to correctly apply augmentations to discrete target images?

For instance, you have a semantic segmentation mask with 81 classes (that is, each pixel has value in {0, 1, ..., 80} indicating the class of that pixel. These target masks are stored as indexed RGB images. You want to apply some geometric augmentation to the input image and the target mask.

The "quick and dirty" way would be to use nearest-neighbor interpolation, as proposed by Amitay Nachmani.

A more "accurate" and "correct" way would be to convert the target mask from a HxW discrete (integer) mask to a CxHxW probability map: that is, each channel will correspond to the probability of each pixel to belong to each of the 81 classes.
Note that this is not the predicted segmentation (output of the net), but rather the targets the net should predict. This way you represent each target pixel as a 1-hot 81-dim vector.
With this representation, you can apply the augmentation to each channel separately, using the same interpolation method as you use for the input image itself (usually bicubic).
Now you have, for each target pixel, its probability to belong to each of the 81 classes, these vectors are no longer 1-hot (due to the interpolation and the transformation). You can use argmax to convert this map back to hard assignments of classes per pixels, or modify the loss function to work with these soft labels to better capture the boundaries between different regions in the image.

Upvotes: 5

Amitay Nachmani
Amitay Nachmani

Reputation: 3279

If you want to resize images and want the result image to have only values from the original range you can use nearest neighbor interpolation.

Upvotes: 3

Related Questions