Ashley
Ashley

Reputation: 497

Neural Networks - Multiple object detection in one image with confidence

I understand how CNNs work for classification problems, such as on the MNIST dataset, where each image represents a hand-written digit. Images are evaluated, and classifications are given with some confidence.

I would like to know what approach I should take if I wish to identify several objects in one image, with a confidence for each. For example - if I evaluated an image of a cat and a dog, I would like a high confidence for both 'cat' and 'dog'. I do not care where the object is in the picture.

My current knowledge would lead me to build a dataset of images containing JUST dogs, and a dataset of images containing JUST cats. I would retrain the top-level of say, the Inception V3 network, and it would be able to identify which images are of cats, and which images are of dogs.

The problem with this is that evaluating an image of a dog and a cat will lead to 50% dog and 50% cat - because it is trying to classify the image, but I want to 'tag' the image (ideally reaching ~100% dog, ~100% cat).

I have briefly looked at region-based CNNs, which address a similar problem, but I don't care where in the picture the objects are - just that they can each be identified.

What approaches exist to solve this problem? I would like to achieve this in Python using something like Tensorflow or Keras.

Upvotes: 2

Views: 2609

Answers (2)

HanClinto
HanClinto

Reputation: 9461

I know this is an old question, but in case it shows up in the front page of any Google searches for anyone else (like it did for me), I figured I could chime in with something helpful.

The final layer of InceptionV3 is a Softmax function, which tries to say this is either label A or label B.

However, if you want to modify something like Inception for multi-label classification, instead of using Softmax for your final layer, you want to swap it out for something like Sigmoid, so that each label is measured on its own merits (and not compared against its neighbors).

More information about the reasoning behind this (along with full instructions about how to modify retrain.py) can be found here:

https://towardsdatascience.com/multi-label-image-classification-with-inception-net-cbb2ee538e30

The add_final_training_ops() method originally added a new softmax and fully-connected layer for training. We just need to replace the softmax function with a different one.

Why?

The softmax function squashes all values of a vector into a range of [0,1] summing together to 1. Which is exactly what we want in a single-label classification. But for our multi-label case, we would like our resulting class probabilities to be able to express that an image of a car belongs to class car with 90% probability and to class accident with 30% probability etc. We will achieve that by using for example sigmoid function. Specifically we will replace:

final_tensor = tf.nn.softmax(logits, name=final_tensor_name)

with:

final_tensor = tf.nn.sigmoid(logits, name=final_tensor_name)

We also have to update the way cross entropy is calculated to properly train our network:

Again, simply replace softmax with sigmoid:

cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits,ground_truth_input)

Upvotes: 3

Tin Luu
Tin Luu

Reputation: 1687

First, to easily understand, just think you have 2 seperate neural networks, one only identify whether cat is in image or not and the other identify dog is dog or not, surely the neurons will learn how do recognize that pretty well.

But more interesting is, those 2 networks can be combined into single network to share weights, and have 2 outputs for dog and cat together. To do that, you just need notice:

  • The 2 class(cat and dog) can be in the same image, then [cat_label, dog label] ={[0, 0], [0, 1], [1, 0], [1, 1]}. Not like MNIST or ordinary classification model where [cat_label, dog label] ={[0, 1], [1, 0]} (one_hot label).
  • When you predict, you may choose some threshold to determine whether cat and dog appear, for example, if y_cat>0.5 and y_dog>0.5, then cat and dog are in the image.

Hope this help!

Upvotes: 2

Related Questions