Resizing a target segmentation map without converting individual pixel values to floats

Question

I have a dataset with drone view images of size 4000x6000, grayscale. Each individual pixel value corresponds to a class (I have 20 classes in total), so a pixel value of 3 would mean "tree" for example. Using the original image, I can very easily create binary masks for all 20 of the classes by using equality operators in NumPy and I get pixel-perfect masks.

Here's an example of what one row would look like:

[[2, 2, 2, 2, ...... , 5, 5, 5]]

However, 4000x6000 is much too big for my purposes, and I want to resize these segmentation targets to something a bit more bearable, such as 400x400 or 400x600. Though I've tried a few different Python libraries, all of them convert my pixel values to different float values causing me to lose my segmentation map labels. Is there any method (not including cropping), where I can resize my segmentation target maps AND the original RGB input images without losing my labels?

Shai · Accepted Answer

When one resizes an image, one usually needs to interpolate pixel values (e.g., decide on the "intensity" at sub-pixel locations). Natural images tend to vary smoothly between pixels, which makes interpolation with large support very appealing (see detailed discussion here).
However, as you observed, interpolating between integer values of labels makes no sense at all.

Therefore, you can:

Do not interpolate - use nearest-neighbor resizing for the label map.
That is, use whatever interpolation method you like (LANCZOS, BICUBIC...) for the input image, but use NEAREST method for the label map.
Interpolate the per-label probability maps - for each 4000x6000 label map, produce 20 per-class probability maps and interpolate them to the desired size (using the same interpolation method as used for the image: LANCZOS, BICUBIC...). Now, for each resized pixel you have a 20-dim target distribution. You can train with these "soft labels", or take the argmax and train with the most dominant label per pixel.

Resizing a target segmentation map without converting individual pixel values to floats

Answers (1)

Related Questions