Reputation: 42033
I'm looking for a way to create a unique hash for images in python and php.
I thought about using md5 sums for the original file because they can be generated quickly, but when I update EXIF information (sometimes the timezone is off) it changes the sum and the hash changes.
Are there any other ways I can create a hash for these files that will not change when the EXIF info is updated? Efficiency is a concern, as I will be creating hashes for ~500k 30MB images.
Maybe there's a way to create an md5 hash of the image, excluding the EXIF part (I believe it's written at the beginning of the file?) Thanks in advance. Example code is appreciated.
Upvotes: 4
Views: 2952
Reputation: 40374
Imagemagick already provides a method to get the image signature. According to the PHP documentation:
Generates an SHA-256 message digest for the image pixel stream.
So my understanding is that the signature isn't affected by changes in the exif information.
Also, I've checked that the PythonMagick.Image.signature
method is available in the python bindings, so you should be able to use it in both languages.
Upvotes: 3
Reputation: 879371
In Python, you could use Image.tostring() to compute the md5 hash for the image data only, without the metadata.
import Image
import hashlib
img = Image.open(filename).convert('RGBA')
m=hashlib.md5()
m.update(img.tostring())
print(m.hexdigest())
Upvotes: 1