Reputation: 121
I read dozens of articles about YOLO but didn't find this answer. The question is: Faster R-CNN uses ROI Pooling to rescale anchors before the fully connected layers, but YOLO doesn't. Some people say YOLO doesn't need ROI pooling because it doesn't have RPN, but YOLO does have different anchors with different sizes/proportions, each one trying to detect an object. How can a neural network be trained with this anchors with different sizes? Yolo calculates a confidence score and a class score, but I can't understand how it's possible without reshaping the anchors.
Upvotes: 1
Views: 522
Reputation: 2117
You speak from fully connected layers and anchor boxes in yolo. Only the first version of Yolo had a fully connected layer but no anchors. Yolo v2 and v3 are both full CNNs without any fully connected layers, but with anchors.
In the first yolo the width and height were directly predicted relative to the width and height of the input. In Yolov2 and v3 anchor boxes were used and only a rescaling of the anchors width and height is learned in the same manner as in e.g. Faster R-CNN, SSD.
Upvotes: 1