Matt Cashatt
Matt Cashatt

Reputation: 24208

Using C#, how do I search for images in a Windows file system like TinEye.com does on the web?

Hi and thanks for looking!

Update

For the sake of clarity, a third-party .NET library is just fine. Preferably an open-source or free one. The solution need not be native .NET.

Background

I am working on an enterprise web application for which the client has given us thousands of pages of content in MS Word documents that we have to parse, extract data, and send to the content database.

Within these docs are various embedded images representing a larger original image in a separate folder.

The client did not provide any paths to the original source image, so when we see content with an embedded image in the MS Word doc, we have to go through several "assets" folders and look for the corresponding image which is extraordinarily time consuming.

We are already using DocX to parse the documents, so you can assume that we have a list of bitmap images to loop through that we have pulled from the document.

Question

Given a list of bitmaps that we just extracted from the document, how do we search a different folder containing hundreds of images, for the matching image, and then return the file path to it?

TinEye.com does this over the web. I am wondering if, using System.Drawing or something, we can do it on a PC with C#.

Thanks!

Matt

Upvotes: 3

Views: 1616

Answers (3)

Matt Cashatt
Matt Cashatt

Reputation: 24208

Hate to propose an answer to my own question, but I think I might be on to something here. Here is heuristic/pseudo code for a C# forms app--your thoughts are appreciated:

Part 1

  1. Using System.IO, traverse the "assets" folders and get all images.
  2. For each image, Base64 encode it.
  3. Take the resulting string and place in an XML file:
<Image>
     <Path>C:\SomePath</Path>
     <EncodedString>[Some Base64 String]<Encoded String>
</Image>

Now we have an XML file containing all original images, in Base64 form, along with their file path.

Part 2

  1. Using DocX, extract all images from MS Word Doc.
  2. For each image, use Linq-to-Xml to search for an exact match in the XML file from Part 1.
  3. If there are no exact matches, start iterating the XML file and computing the Levenshtein distance.
  4. While in the foreach store the XML node Id (or file path) and Levenshtein Distance as a key value pair in an object.
  5. Take the k/v pair with the lowest LD score and return the file path.
  6. For performance, set tolerance so that the foreach stops if a certain original image has an acceptably low LD score when compared to the image extracted from the document.

Since this is a one-off task, I don't need instant performance. So, I could run this tonight before leaving the office and, hopefully, come back tomorrow to a list of paths connecting the original images to the ones embedded in the docs.

UPDATE

The heuristic above worked beautifully! I ended up using the Sift library to efficiently calculate distances between Base64 strings. Specifically, I used their FastDistance() method. Having 100% accuracy on finding the images I need, even if the angle from which the photo was taken is slightly different.

Upvotes: 2

dgvid
dgvid

Reputation: 26633

According to this SO answer to a similar question, you should look at OpenCV and VLFeat. The former has a C++ API and the latter a C API, so you would need to write your own P/Invoke wrapper or perhaps wrap them in a C++/CLI facade, which you could call from C#.

Upvotes: 0

Chris Shain
Chris Shain

Reputation: 51329

There is no built-in algorithm in the .NET framework for generating image similarity. You'd need to use a third-party library or do it yourself. Lots of image similarity algo questions on SO:

Algorithm for finding similar images

How can I measure the similarity between two images?

comparing images programmatically - lib or class

One more, for .NET: Are there any OK image recognition libraries for .NET?. This one refers you to AForge, which seems to have the algorithm that you are after.

Upvotes: 0

Related Questions