Reputation: 1720
This question has maybe been answered but I didn't find a simple answer to this. I created a convnet using Keras to classify The Simpsons characters (dataset here).
I have 20 classes and giving an image as input, I return the character name. It's pretty simple. My dataset contains pictures with the main character in the picture and only have the name of the character as a label.
Now I would like to add an object detection ask i.e draw a bounding box around characters in the picture and predict which character it is. I don't want to use a sliding window because it's really slow. So I thought about using faster RCNN (github repo) or YOLO (github repo). Should I have to add the coordinates of the bounding box for each picture of my training set? Is there a way to do object detection (and get bounding boxes in my test) without giving the coordinates for the training set?
In sum, I would like to create a simple object detection model, I don't know if it's possible to create a simpler YOLO or Faster RCNN.
Thank you very much for any help.
Upvotes: 15
Views: 14649
Reputation: 41
You may already have a suitable architecture in mind already: "Now I would like to add an object detection ask i.e draw a bounding box around characters in the picture and predict which character it is."
So you just split the task in two parts:
1. Add an object detector for person detection to return bounding boxes
2. Classify bounding boxes using the convnet you already trained
For part 1 you should be good to go by using a feature detector (for example a convnet pretrained on COCO or Imagenet) with an object detector (still YOLO and Faster-RCNN) on top to detect people. However, you may find that people in "cartoons" (let's say Simpsons are people) are not properly recognized because the feature detector is not trained on cartoon-based images but on real images. In that case, you could try to re-train a few layers of the feature detector on cartoon pictures in order to learn cartoon features, according to the transfer learning methodology.
Upvotes: 4
Reputation: 258
The goal of yolo or faster rcnn is to get the bounding boxes. So in short, yes you will need to label the data to train it.
Take a shortcut:
Upvotes: 13