A. Attia
A. Attia

Reputation: 1720

Object detection using Keras : simple way for faster R-CNN or YOLO

This question has maybe been answered but I didn't find a simple answer to this. I created a convnet using Keras to classify The Simpsons characters (dataset here).
I have 20 classes and giving an image as input, I return the character name. It's pretty simple. My dataset contains pictures with the main character in the picture and only have the name of the character as a label.

Now I would like to add an object detection ask i.e draw a bounding box around characters in the picture and predict which character it is. I don't want to use a sliding window because it's really slow. So I thought about using faster RCNN (github repo) or YOLO (github repo). Should I have to add the coordinates of the bounding box for each picture of my training set? Is there a way to do object detection (and get bounding boxes in my test) without giving the coordinates for the training set?

In sum, I would like to create a simple object detection model, I don't know if it's possible to create a simpler YOLO or Faster RCNN.

Thank you very much for any help.

Upvotes: 15

Views: 14649

Answers (2)

Michelagio
Michelagio

Reputation: 41

You may already have a suitable architecture in mind already: "Now I would like to add an object detection ask i.e draw a bounding box around characters in the picture and predict which character it is."

So you just split the task in two parts:
1. Add an object detector for person detection to return bounding boxes
2. Classify bounding boxes using the convnet you already trained

For part 1 you should be good to go by using a feature detector (for example a convnet pretrained on COCO or Imagenet) with an object detector (still YOLO and Faster-RCNN) on top to detect people. However, you may find that people in "cartoons" (let's say Simpsons are people) are not properly recognized because the feature detector is not trained on cartoon-based images but on real images. In that case, you could try to re-train a few layers of the feature detector on cartoon pictures in order to learn cartoon features, according to the transfer learning methodology.

Upvotes: 4

Andrew Tu
Andrew Tu

Reputation: 258

The goal of yolo or faster rcnn is to get the bounding boxes. So in short, yes you will need to label the data to train it.

Take a shortcut:

  • 1) Label a handful of bounding boxes for (lets say 5 per character).
  • 2) Train faster rcnn or yolo on the very small dataset.
  • 3) Run your model against the full dataset
  • 4) It will get some right, get alot of it wrong.
  • 5) Train the faster rcnn on the ones that are correctly bounded, your training set should be much bigger now.
  • 6) repeat until you have your desired result.

Upvotes: 13

Related Questions