Reputation: 3693
Looking at recent advances of Object recognition utilizing Deep Learning, such as MASK-RCNN or YOLO I noticed that the bounding box of an object is always parallel to the image borders.
Is this only due to the notations of the provided training data, such as COCO or is it due to the underlying architecture. Looking at the last layers of Yolo or RCNN - shouldn it be possible to train on rectangles which are rotated just like the object in the image?
Upvotes: 2
Views: 789
Reputation: 2553
These models usually predict a center point in x and y, as well as a width and height. That explains the aligned outcome. If the training data provides another form of labels, it should be easily possible to learn other bounding boxes as well.
Upvotes: 2