Reputation: 24208
Hi and thanks for looking!
For the sake of clarity, a third-party .NET library is just fine. Preferably an open-source or free one. The solution need not be native .NET.
I am working on an enterprise web application for which the client has given us thousands of pages of content in MS Word documents that we have to parse, extract data, and send to the content database.
Within these docs are various embedded images representing a larger original image in a separate folder.
The client did not provide any paths to the original source image, so when we see content with an embedded image in the MS Word doc, we have to go through several "assets" folders and look for the corresponding image which is extraordinarily time consuming.
We are already using DocX to parse the documents, so you can assume that we have a list of bitmap images to loop through that we have pulled from the document.
Given a list of bitmaps that we just extracted from the document, how do we search a different folder containing hundreds of images, for the matching image, and then return the file path to it?
TinEye.com does this over the web. I am wondering if, using System.Drawing or something, we can do it on a PC with C#.
Thanks!
Matt
Upvotes: 3
Views: 1616
Reputation: 24208
Hate to propose an answer to my own question, but I think I might be on to something here. Here is heuristic/pseudo code for a C# forms app--your thoughts are appreciated:
<Image> <Path>C:\SomePath</Path> <EncodedString>[Some Base64 String]<Encoded String> </Image>
Now we have an XML file containing all original images, in Base64 form, along with their file path.
foreach
store the XML node Id (or file path) and Levenshtein Distance as a key value pair in an object.foreach
stops if a certain original image has an acceptably low LD score when compared to the image extracted from the document.Since this is a one-off task, I don't need instant performance. So, I could run this tonight before leaving the office and, hopefully, come back tomorrow to a list of paths connecting the original images to the ones embedded in the docs.
The heuristic above worked beautifully! I ended up using the Sift library to efficiently calculate distances between Base64 strings. Specifically, I used their FastDistance() method. Having 100% accuracy on finding the images I need, even if the angle from which the photo was taken is slightly different.
Upvotes: 2
Reputation: 26633
According to this SO answer to a similar question, you should look at OpenCV and VLFeat. The former has a C++ API and the latter a C API, so you would need to write your own P/Invoke wrapper or perhaps wrap them in a C++/CLI facade, which you could call from C#.
Upvotes: 0
Reputation: 51329
There is no built-in algorithm in the .NET framework for generating image similarity. You'd need to use a third-party library or do it yourself. Lots of image similarity algo questions on SO:
Algorithm for finding similar images
How can I measure the similarity between two images?
comparing images programmatically - lib or class
One more, for .NET: Are there any OK image recognition libraries for .NET?. This one refers you to AForge, which seems to have the algorithm that you are after.
Upvotes: 0