Reputation: 23
I am training for Custom Object Detection using Mask RCNN in TensorFlow Object Detection. Therefore, I am to predict the object instance mask along with the bounding box.
Pre-trained model : mask_rcnn_inception_v2_coco
Following is a snapshot of my training.
INFO:tensorflow:global step 4181: loss = 0.0031 (3.290 sec/step)
INFO:tensorflow:global step 4181: loss = 0.0031 (3.290 sec/step)
INFO:tensorflow:global step 4182: loss = 0.0030 (2.745 sec/step)
INFO:tensorflow:global step 4182: loss = 0.0030 (2.745 sec/step)
In this case, can you please tell me what is the loss here?
My questions is not related to training loss and its variation w.r.t. the steps.
I am just unclear about what is meant by this loss while training a Mask RCNN? In a Mask RCNN, there are 3 parallel heads at the last layer,
In such a case, what is loss?
Upvotes: 2
Views: 1269
Reputation: 1694
The loss function of the Mask R-CNN paper combines a weighted sum of 3 losses (the 3 outputs): classification, localization and segmentation mask:
The classification and bounding-box (localization) losses are the same as in Faster R-CNN.
What is added is a per-pixel sigmoid + binary loss for the mask. The mask branch generates a mask for each class, without competition among classes (so if you have 10 classes the mask branch predicts 10 masks). The loss being used is per-pixel sigmoid + binary loss.
If you want to dive in a little bit deeper into the mask loss, the paper states that "Multinomial vs. Independent Masks: Mask R-CNN decouples mask and class prediction: as the existing box branch predicts the class label, we generate a mask for each class without competition among classes (by a per-pixel sigmoid and a binary loss). In Table 2b, we compare this to using a per-pixel softmax and a multinomial loss (as commonly used in FCN [30])."
you can see it in the paper at page number 6, table number 2.b ("Multinomial vs. Independent Masks").
Upvotes: 1