Does the HuggingFace Mask2FormerImageProcessor support overlapping features?

Question

I need to do semantic segmentation where I have overlapping instances, E.g. a dataset with labels for "person" and "T-shirt". I need to use the model Mask2FormerForUniversalSegmentation, which seems to support this. However, I'm getting errors using the HuggingFace Mask2FormerImageProcessor to prepare data for model inference and training.

For example:

import numpy as np

from transformers.image_utils import ChannelDimension
from transformers import Mask2FormerImageProcessor  # Assumes torchvision is installed

processor = Mask2FormerImageProcessor(do_rescale=False, do_resize=False, do_normalize=False)

num_classes = 2
num_features = 5
height, width = (16, 16)
images = [np.zeros((height, width, 3))]
segmentation_maps = [np.random.randint(0, num_classes, (height, width, num_features))]

batch = processor(images,
                  segmentation_maps=segmentation_maps,
                  return_tensors="pt",
                  input_data_format=ChannelDimension.LAST)

gives

ValueError: Unable to infer channel dimension format

According to https://github.com/NielsRogge/Transformers-Tutorials/issues/296#issuecomment-1657815329, as of July 2023, Mask2FormerImageProcessor does not support overlapping features.

Has anything changed? Does anyone have a workaround for this?

Does the HuggingFace Mask2FormerImageProcessor support overlapping features?

Answers (0)

Related Questions