Reputation: 1
I want to count the total number of pixels for each segmented class, I only need the count for each general objects, like one class for every vehicle, one for every person and so on. For this reason, I'm using semantic segmentation instead of instance segmentation (which would consider each vehicle or person instance separately).But the output of semantic segmentation in detectron2 does not have binary mask.
I know the output of instance segmentation is binary mask and can get the pixel count using the following code:
masks = output['instances'].pred_masks
results = torch.sum(torch.flatten(masks, start_dim=1),dim=1)
This gives the pixel count but considers each vehicle instance separately which I do not want . But the output of semantic segmentation is the field 'sem_seg' which contains predicted class probabilities for each general class and not binary mask, how can I go on into getting the pixel count for each class in semantic segmentation?
Upvotes: 0
Views: 116
Reputation: 530
though its been 7 months since the question has been asked, but still answering it you or someone else might need it
as mentioned by Christoph Rackwit, to sum over the instances, i would be using the same method to calculate the total number of pixels, along with the code mentioned by you to find the total pixels for each instance, dervied from instance segmentation
import locale
pre_classes = MetadataCatalog.get(self.cfg.DATASETS.TRAIN[0]).thing_classes # this contains the class names in the same order used for training, replace it with your custom dataset class names
masks = predictions["instances"].pred_masks # this extracts the pred_masks from the predicitons
classes = predictions["instances"].pred_classes.numpy().tolist() # this extracts the pred_classes (contains index values corresponding to pre_classes) to a simple from the predicitons
results = torch.sum(torch.flatten(masks, start_dim=1), dim=1).numpy().tolist() # this calculates the total pixels of each instance
count = dict() # create a dict to store unique classes and their total pixel
for i in range(len(classes)): # itearte over the predicted classes
count[classes[i]] = count.get(classes[i], 0) + results[i] # add the current sum of pixel of particular class and instance to the previous sum of the same class, adds 0 if the class didnt already exist
locale.setlocale(locale.LC_ALL, 'en_IN.UTF-8') # set the locale to Indian format
for k, v in count.items(): # itearte over the dict
print(f"{pre_classes[k]} class contains {locale.format_string('%d', v, grouping=True)} pixels") # printing each class and its total pixel, pres_classes[k] for accessing corresponding class names for class index, locale.format_string for formating the raw number to Indian format
i used the predefined model to perform instance segmentation on the following image
which resulted in the following image
which also produced your required results
dog class contains 1,39,454 pixels
cat class contains 95,975 pixels
as i havent had any hands on experience on semantic segmentation, so the solution is provided using instance segmentation,but if you insist on achieving the same using semantic segmentation please provide the weights, and the inference methods and test dataset, so that i can help you out
anyways i hope that this is what you were looking for, any questions related to the code, logic or working, feel free to contact
Upvotes: 0