ensnare
ensnare

Reputation: 42033

Get filesize and md5 hash in one pass, reading the file only once

How can I open a file, calculate its md5 hash and filesize, while only scanning through the file one time?

Right now I'm doing:

def getMD5Hash(fname):
  """ Returns an md5 hash
  """
  try:
    with open(fname,'rb') as fo:
      md5 = hashlib.md5()
      chunk_sz = md5.block_size * 128
      data = fo.read(chunk_sz)
      while data:
        md5.update(data)
        data = fo.read(chunk_sz)
    md5hash = base64.urlsafe_b64encode(md5.digest()).decode('UTF-8').rstrip('=\n')
  except IOError:
    md5hash = None

  return md5hash

size = os.path.getsize(fname)
hash = getMD5Hash(fname)

But, from what I understand, this requires two passes of the file and could be more efficient.

Upvotes: 0

Views: 82

Answers (1)

Adam
Adam

Reputation: 17329

A file does not have to be scanned to get its length. The filesystem knows how big a file is.

If you insist on doing it manually, set size = 0 then do size += len(data) inside your while loop.

Of course your getMD5Hash() is now getMD5Hash_and_size().

Upvotes: 3

Related Questions