Reputation: 5933
I am looking to store the fingerprint / image for 1 million images, so that upon upload it will shoot a % of how similar the image compared to other matches in the database. Similar to http://www.tineye.com/ and http://images.google.com/ but for my own personal site. I do not want to submit the images to tineye using their submission process.
What information should be saved?
How should I save it?
Any good PHP libraries that do what I want already?
I would like to keep it PHP only, but I think the processing power may need to be outsourced by an application and then PHP can process the output. I am running Debian Linux.
For storage, I was going to store just the information in MySQL but I think it may inefficient given 1 million images.
Upvotes: 1
Views: 2847
Reputation: 9627
If it's "perceptual hashes" you are looking for, you could also have a look at:
They offer a php-extension, too.
Upvotes: 0
Reputation: 5933
I decided to go for this PHP solution:
http://www.pureftpd.org/project/libpuzzle
Even though it's a little outdated and didn't quite work with a cropped image, it was able to identify small edits, color changes, and some resizes. It also comes with example PHP code (albeit buggy)
Upvotes: 1
Reputation: 27087
I think use GD, ImageMagic and it would be good to use a range of APIs. Since it is only you're site your suggesting then the API issue is not really paramount; API would be for a bigger app;
Example
Uploaded Image
Image information submits to Database and Image is deleted from Server and Stored in the CDN
Image information to database
Size
Dimensions
Timestamp
Uploader
Type of Image
Image Category
Image Tags
Image Description
You could then cron processing tasks to scan images for majority of colour on images. Shapes of images. Majority of images with text. What text. You can then build library and matching tags with these Ids as numbers. This will be patterns. You can scan for same images and matching patterns.. you can go even more deeper but then you might aswell go against Google/IBM..
Upvotes: 1
Reputation: 9206
Such analysis is done by using complex algorithms like
http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
This one is copyrighted, but there are implementation with source code, available on the net.
Upvotes: 0