Shamane Siriwardhana
Shamane Siriwardhana

Reputation: 4201

What is the loss function of the Mask RCNN?

The paper has clearly mentioned the classification and regression losses are identical to the RPN network in the Faster RCNN . Can someone explain the Mask Loss function . How the use FCN to improve ?

Upvotes: 6

Views: 21401

Answers (2)

Saleem Ahmed
Saleem Ahmed

Reputation: 2999

The multi-task loss function of Mask R-CNN combines the loss of classification, localization and segmentation mask: L=Lcls+Lbox+Lmask, where Lcls and Lbox are same as in Faster R-CNN.

The mask branch generates a mask of dimension m x m for each RoI and each class; K classes in total. Thus, the total output is of size K⋅m^2

Because the model is trying to learn a mask for each class, there is no competition among classes for generating masks.

Lmask :

is defined as the average binary cross-entropy loss, only including k-th mask if the region is associated with the ground truth class k. Lmask Equation

where yij is the label of a cell (i, j) in the true mask for the region of size m x m; y^kij is the predicted value of the same cell in the mask learned for the ground-truth class k.

Upvotes: 7

rkellerm
rkellerm

Reputation: 5512

FCN uses per-pixel softmax and a multinominal loss. This means, that the mask prediction task (the boundaries of the object) and the class prediction task (what is the object being masked) are coupled.
Mask-RCNN decouples these tasks: the existing bounding-box prediction (AKA the localization task) head predicts the class, like faster-RCNN, and the mask branch generates a mask for each class, without competition among classes (e.g. if you have 21 classes the mask branch predicts 21 masks instead of FCN's single mask with 21 channels). The loss being used is per-pixel sigmoid + binary loss.
Bottom line, it's Sigmoid in Mask-RCNN vs. Soft-max in FCN.
(See table 2.b. in Mask RCNN paper - Ablation section).

Upvotes: 11

Related Questions