Starting point for image recognition?

Question

I have a set of 274 color images (each one is 200x150 pixels). Each image is visually distinct. I would like to build an app which accepts an up/down-scaled version of one of the base set of images and determines the closest match.

I'm a senior software engineer but am totally new to image recognition. I'd really appreciate any recommendations as to where to start.

yhenon · Accepted Answer

If you're comparing extremely similar images, it's in theory sufficient to calculate the Euclidean distance between the 2 images. The images must be the same size to do so, so it is often necessary to rescale an image to do so (generally the larger image is scaled down). Note that aliasing issues can happen here, so pay some attention to your downsampling algorithm. There's also an issue if your images don't have the same aspect ratio.

However, this is almost never done in practice since it's extremely slow. For N images of size WxH and 3 color channels, it requires N x W x H x 3 comparisons, which quickly gets unworkable (consider that many users can have over 1000 images of size >1000x1000).

Generally we attempt to reduce the image to a smaller array that captures the image information much more briefly, called a visual descriptor. For example taking a 1024x1024x3 image and reducing it to a 128 length vector. This needs only be calculated once for the reference images, and then stored in an appropriate data structure. Then we can compare the descriptor for the query image against the descriptor for the reference images.

The cost of calculating the distance for our dataset of N images for a descriptor of length L is then N x L instead of the original N x W x H x 3

So the issue is to find efficient descriptors that are (a) cheap to compute and (b) capture the image accurately. This is still an active area of research, but I can suggest some:

Histograms are probably the simplest way to do this, although they do very poorly with any illumination change and incorporate only color information, no spatial information. Make sure you normalise your histogram before doing any comparison
Perceptual hashing works well with very similar images or slightly cropped images. See here
GIST descriptors are powerful, but more complex, see here

Starting point for image recognition?

Answers (1)

Related Questions