Why does the bounding box of an object detection CNN has to be parallel to the image borders?

Question

Looking at recent advances of Object recognition utilizing Deep Learning, such as MASK-RCNN or YOLO I noticed that the bounding box of an object is always parallel to the image borders.

Is this only due to the notations of the provided training data, such as COCO or is it due to the underlying architecture. Looking at the last layers of Yolo or RCNN - shouldn it be possible to train on rectangles which are rotated just like the object in the image?

pietz · Accepted Answer

These models usually predict a center point in x and y, as well as a width and height. That explains the aligned outcome. If the training data provides another form of labels, it should be easily possible to learn other bounding boxes as well.

Why does the bounding box of an object detection CNN has to be parallel to the image borders?

Answers (1)

Related Questions