Reputation: 42033
Using Python, how can I compute the md5 hash of a TIFF image, excluding all metadata?
With a JPG, I do something like this:
def jpeg(fh):
hash = hashlib.md5()
assert fh.read(2) == "\xff\xd8"
while True:
marker,length = struct.unpack(">2H", fh.read(4))
assert marker & 0xff00 == 0xff00
if marker == 0xFFDA: # Start of stream
hash.update(fh.read())
break
else:
fh.seek(length-2, os.SEEK_CUR)
print "Hash: %r" % hash.hexdigest()
>> jpeg(file("test.jpg"))
>> Hash: 'debb4956941795d6ef48717e4c9cc433'
Not sure how to extend this to TIFF images.
It seems trickier with TIFFs because the location of the metadata within the image can change (it's not always at the beginning or end).
Upvotes: 2
Views: 1197
Reputation: 43495
Use the Image module from the Python Imaging Library. The tostring
method of the Image
class returns the pixel data as a string.
import Image # Python Imaging Library
import hashlib
def hashtiff(fn):
tf = Image.open(fn)
return hashlib.md5(tf.tostring()).hexdigest()
MD5 has known weaknesses as a hash algorithm. It is considered better to use e.g. SHA-256 or SHA-512.
Upvotes: 5