Vijay Prakash Dwivedi
Vijay Prakash Dwivedi

Reputation: 23

While training Mask RCNN using TensorFlow Object Detection API, what is the 'loss'?

I am training for Custom Object Detection using Mask RCNN in TensorFlow Object Detection. Therefore, I am to predict the object instance mask along with the bounding box.

Pre-trained model : mask_rcnn_inception_v2_coco

Following is a snapshot of my training.

INFO:tensorflow:global step 4181: loss = 0.0031 (3.290 sec/step)

INFO:tensorflow:global step 4181: loss = 0.0031 (3.290 sec/step)

INFO:tensorflow:global step 4182: loss = 0.0030 (2.745 sec/step)

INFO:tensorflow:global step 4182: loss = 0.0030 (2.745 sec/step)

In this case, can you please tell me what is the loss here?

My questions is not related to training loss and its variation w.r.t. the steps.

I am just unclear about what is meant by this loss while training a Mask RCNN? In a Mask RCNN, there are 3 parallel heads at the last layer,

In such a case, what is loss?

Upvotes: 2

Views: 1269

Answers (1)

Mark.F
Mark.F

Reputation: 1694

The loss function of the Mask R-CNN paper combines a weighted sum of 3 losses (the 3 outputs): classification, localization and segmentation mask:

The classification and bounding-box (localization) losses are the same as in Faster R-CNN.

What is added is a per-pixel sigmoid + binary loss for the mask. The mask branch generates a mask for each class, without competition among classes (so if you have 10 classes the mask branch predicts 10 masks). The loss being used is per-pixel sigmoid + binary loss.

If you want to dive in a little bit deeper into the mask loss, the paper states that "Multinomial vs. Independent Masks: Mask R-CNN decouples mask and class prediction: as the existing box branch predicts the class label, we generate a mask for each class without competition among classes (by a per-pixel sigmoid and a binary loss). In Table 2b, we compare this to using a per-pixel softmax and a multinomial loss (as commonly used in FCN [30])."

you can see it in the paper at page number 6, table number 2.b ("Multinomial vs. Independent Masks").

Upvotes: 1

Related Questions