StringTheory
StringTheory

Reputation: 117

Recognizing multiple objects in an image with convolutional neural networks

I've seen quite a few CNN code examples for ID'ing images, but they generally relate to a 1-to-1 input to target relationship (like the MNISt handwritten numerals set), and most seem to use similar image dimensions (pixels) for the input image and training images.

So...what is the usual approach for identifying multiple objects in one image? (like several people, or any other relatively complex scene). I've seen it done often enough, but haven't seen design approaches mentioned. Does this require some type of preprocessing or can this be handled directly by a CNN?

Upvotes: 2

Views: 1937

Answers (1)

LFX
LFX

Reputation: 27

I would say the most known family of techniques to retrieve multiple objects from an images would be the Detection family.

With Detection, the basic idea is to have one or more Proposal windows of different sizes and ratios within an image, generated with either a calculated or random array of algorithms.

For each Proposal window, the Classification algorithm is then executed to reveal what that specific area of the image represents.

The next step would usually be to run a Merge process to combine all neighbouring areas into one single classification output.

Note: A None class is often also used to represent an area with no specific class found.

Upvotes: 3

Related Questions