Beutler
Beutler

Reputation: 83

How do you use a trained neural net to identify multiple objects in an image?

I've been exploring neural networks and have been able to successfully train a network even on my own images in a way to label individual pictures as certain things, but don't know how to use that trained network to identify and perhaps return multiple objects from one image. For example, if you trained cats and dogs, and one image has multiple cats and dogs, how would you apply the trained network and return their location (in the image)?

Here is the main tutorial I followed for implementation in Python: http://machinelearningmastery.com/object-recognition-convolutional-neural-networks-keras-deep-learning-library/

A general answer would suffice, as in, is a sliding window over the image the best solution for this or is there something easier?

A specific example (particularly in python) would be appreciated. I've been using matplotlib for most of the image work, so I'd prefer to stay away from PIL slicing.

Thanks!

Upvotes: 3

Views: 1521

Answers (1)

Enigma
Enigma

Reputation: 329

As you want to use your existing trained n/w:

  1. Brute Sliding window: you will have to process many windows (slide by pixel based on image size) if you don't know the size and location of the object in the image, and each window may produce different outcomes and may be one or few of those are the final required results, do you see how the complexity increases. There will be difficulty in identifying the actual required outcomes among many.
  2. Preprocessing: images can be preprocessed before feeding it to the network. For instance, take an image with a monkey and a snake, calculate energy (Sobel et.al) of the image. Monkeys footprint in the image is more like round balloon (more area) and snake would be thread-like (less area), based on this have a python script to crop the image to that particular section, then feed this to the n/w. You can think of other preprocessing techniques.

If you are open to other n/w's, check out CRF as Recurrent Neural Networks. Ex: https://github.com/torrvision/crfasrnn

Hope this helps.

Upvotes: 2

Related Questions