Kev1n91
Kev1n91

Reputation: 3693

Why does the bounding box of an object detection CNN has to be parallel to the image borders?

Looking at recent advances of Object recognition utilizing Deep Learning, such as MASK-RCNN or YOLO I noticed that the bounding box of an object is always parallel to the image borders.

Is this only due to the notations of the provided training data, such as COCO or is it due to the underlying architecture. Looking at the last layers of Yolo or RCNN - shouldn it be possible to train on rectangles which are rotated just like the object in the image?

Upvotes: 2

Views: 789

Answers (1)

pietz
pietz

Reputation: 2553

These models usually predict a center point in x and y, as well as a width and height. That explains the aligned outcome. If the training data provides another form of labels, it should be easily possible to learn other bounding boxes as well.

Upvotes: 2

Related Questions