Reputation: 656
I have a binary mask for each image, with each mask pixel having value of either 0 or 255. Now because my image segmentation needs images of fixed size, I will have to resize the images and the masks. However, when I resize the image, then there will be places in the mask where the value is more than 0 but smaller than 255. How do I know which one to keep ? Because the library demands that there be only 0 or 255. Please help me, thank you very much.
Upvotes: 6
Views: 6388
Reputation: 114866
If you resize using interpolation other than nearest-neighbor, you indeed get values in the range of [0, 255]. This is not necessarily a bad thing. If your loss function is cross-entropy, you can think of these values as "soft labels". That is, this pixel does not have a "hard" assignment to any of the targets, but rather a "soft", probabilistic, assignment to both.
Generalizing this for multi-label segmentation masks, and to more complex geometric augmentations (e.g., rotations, affine, ...). The question is
How to correctly apply augmentations to discrete target images?
For instance, you have a semantic segmentation mask with 81 classes (that is, each pixel has value in {0, 1, ..., 80}
indicating the class of that pixel.
These target masks are stored as indexed RGB images.
You want to apply some geometric augmentation to the input image and the target mask.
The "quick and dirty" way would be to use nearest-neighbor interpolation, as proposed by Amitay Nachmani.
A more "accurate" and "correct" way would be to convert the target mask from a HxW
discrete (integer) mask to a CxHxW
probability map: that is, each channel will correspond to the probability of each pixel to belong to each of the 81 classes.
Note that this is not the predicted segmentation (output of the net), but rather the targets the net should predict. This way you represent each target pixel as a 1-hot 81-dim vector.
With this representation, you can apply the augmentation to each channel separately, using the same interpolation method as you use for the input image itself (usually bicubic).
Now you have, for each target pixel, its probability to belong to each of the 81 classes, these vectors are no longer 1-hot (due to the interpolation and the transformation). You can use argmax
to convert this map back to hard assignments of classes per pixels, or modify the loss function to work with these soft labels to better capture the boundaries between different regions in the image.
Upvotes: 5
Reputation: 3279
If you want to resize images and want the result image to have only values from the original range you can use nearest neighbor interpolation.
Upvotes: 3