user7115722
user7115722

Reputation:

How to create bounding boxes around the ROIs using TensorFlow

I'm using inception v3 and tensorflow to identify some objects within the image. However, it just create a list of possible objects and I need it to inform their position in the image.

I'm following the flowers tutorial: https://www.tensorflow.org/versions/r0.9/how_tos/image_retraining/index.html

bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir ~/flower_photos

Upvotes: 7

Views: 5276

Answers (3)

Pet Detective
Pet Detective

Reputation: 86

Putting bounding boxes around objects is usually called detection in the lingo of the field, and there is a whole category of networks designed for it. There's a separate category in the PASCAL VOC competition for detection, and that's a good place to find good detection networks

My favorite detection network (which is the currently leader for the 2012 PASCAL VOC dataset) is YOLO, which starts with a typical classifier, but then has some extra layers to support bounding boxes. Instead of just returning a class, it produces downsampled version of the original image, where each pixel has its own class. Then it has a regression layer that predicts the exact position and size of the bounding boxes. You can start with a pre-trained classifier, and then modify it into a YOLO network and retrain it. The procedure is described in the the original paper about YOLO

I like YOLO because it has a simple structure, compared to other detection networks, it allows you to use transfer learning from classification networks (which makes it easier to train), and the detection speed is very fast. It was actually developed for real-time detection in video.

There is an implementation of YOLO in TensorFlow, if you would like to avoid using the custom darknet framework used by the authors of YOLO.

Upvotes: 2

Dellein
Dellein

Reputation: 343

By default inception does not output coordinates. There are specific tools for that like Faster R-CNN available for Caffe.

If you want to stick with tensorflow, you can retrain the inception to output the coordinates if you have the human annotated images.

Upvotes: 2

nessuno
nessuno

Reputation: 27050

Inception is a classification network, not a localization network.

You need another architecture to predict the bounding boxes, like R-CNN and its newer (and faster) variants (Fast R-CNN, Faster R-CNN).

Optionally, if you want to use inception and you have a train set annotated with class and bounding box coordinates, you can add a regression head to inception, and make the network learn to regress the bounding box coordinates. It's the same thing of transfer learning, but you just use the last convolutional layer output as a feature extractor, and train this new head to regress 4 coordinates + 1 class for every bounding box in your training set.

Upvotes: 6

Related Questions