Reputation: 1230
I need to do semantic segmentation where I have overlapping instances, E.g. a dataset with labels for "person" and "T-shirt". I need to use the model Mask2FormerForUniversalSegmentation
, which seems to support this. However, I'm getting errors using the HuggingFace Mask2FormerImageProcessor
to prepare data for model inference and training.
For example:
import numpy as np
from transformers.image_utils import ChannelDimension
from transformers import Mask2FormerImageProcessor # Assumes torchvision is installed
processor = Mask2FormerImageProcessor(do_rescale=False, do_resize=False, do_normalize=False)
num_classes = 2
num_features = 5
height, width = (16, 16)
images = [np.zeros((height, width, 3))]
segmentation_maps = [np.random.randint(0, num_classes, (height, width, num_features))]
batch = processor(images,
segmentation_maps=segmentation_maps,
return_tensors="pt",
input_data_format=ChannelDimension.LAST)
gives
ValueError: Unable to infer channel dimension format
According to https://github.com/NielsRogge/Transformers-Tutorials/issues/296#issuecomment-1657815329, as of July 2023, Mask2FormerImageProcessor
does not support overlapping features.
Has anything changed? Does anyone have a workaround for this?
Upvotes: 0
Views: 23