ensnare
ensnare

Reputation: 42033

Compute the hash of only the core image data of a TIFF

Using Python, how can I compute the md5 hash of a TIFF image, excluding all metadata?

With a JPG, I do something like this:

def jpeg(fh):
  hash = hashlib.md5()
  assert fh.read(2) == "\xff\xd8"
  while True:
      marker,length = struct.unpack(">2H", fh.read(4))
      assert marker & 0xff00 == 0xff00
      if marker == 0xFFDA: # Start of stream
          hash.update(fh.read())
          break
      else:
          fh.seek(length-2, os.SEEK_CUR)
  print "Hash: %r" % hash.hexdigest()

>> jpeg(file("test.jpg"))
>> Hash: 'debb4956941795d6ef48717e4c9cc433'

Not sure how to extend this to TIFF images.

It seems trickier with TIFFs because the location of the metadata within the image can change (it's not always at the beginning or end).

Upvotes: 2

Views: 1197

Answers (1)

Roland Smith
Roland Smith

Reputation: 43495

Use the Image module from the Python Imaging Library. The tostring method of the Image class returns the pixel data as a string.

import Image # Python Imaging Library
import hashlib

def hashtiff(fn):
    tf = Image.open(fn)
    return hashlib.md5(tf.tostring()).hexdigest()

MD5 has known weaknesses as a hash algorithm. It is considered better to use e.g. SHA-256 or SHA-512.

Upvotes: 5

Related Questions