Chris Parry
Chris Parry

Reputation: 3057

Transforming an image before extracting SIFT features

Is there any advantage to transforming an image before computing SIFT features? For example, I am trying to match a "target" image of a banana:

banana

...to a "scene" image which also contains a banana, but in some unknown orientation and perspective.

First approach: extract SIFT features from the target image, match them to SIFT features in the scene image and compute a homography.

Second approach: transform the target image in various ways to simulate changes of perspective:

enter image description here

...before extracting SIFT features from each transform. Combine the extracted features and then match them to the scene and compute a homography.

Is there any advantage to approach 2, in terms of fidelity of feature matching?

Upvotes: 1

Views: 800

Answers (3)

old-ufo
old-ufo

Reputation: 2860

It will definitely help. Two papers, which improve SIFT a lot, are based on this principle. First - ASIFT - simulates a very big number, then match images n x n way. Much more robust than SIFT and MUCH slower.

Second - MODS - do synthesis iteratively (only if needed) and use Hessian-Affine and MSER as detectors, which improves robustness and speed over ASIFT. Actually, banana example is in MODS paper:

enter image description here

At both links you could find paper and source codes (C++).

Upvotes: 1

dynamic
dynamic

Reputation: 48141

SIFT is good when you have a textured objects... A banana is mainly a shape with a single color.

In this condition keypoint-extraction (sift, etc) will fail. No matter what

Upvotes: 0

onemasse
onemasse

Reputation: 6584

I'd guess no. But you never know until you try. SIFT is as good as it gets when it comes to reliability. If there were any benefits to it I'd guess someone would've already implemented it as an improved algorithm.

I guess it also depends on how large blobs the algorithm detects. I'm more familiar with SURF, but I know that SIFT works similarly. Both algorithms detect blobs of different scale. When the perspective changes I'd guess the bigger blobs will fail to match, but the smaller blobs will continue to be effective.

Also if you transform the images and then extract the feature, if the transform isn't significant enough, if it's too similar to the original feature the matching algorithm will discard both the original and the transformed feature. Because the matching works by excluding all matches but one that is X times more likely than the next best match.

Upvotes: 1

Related Questions