Reputation: 42033
How can I open a file, calculate its md5 hash and filesize, while only scanning through the file one time?
Right now I'm doing:
def getMD5Hash(fname):
""" Returns an md5 hash
"""
try:
with open(fname,'rb') as fo:
md5 = hashlib.md5()
chunk_sz = md5.block_size * 128
data = fo.read(chunk_sz)
while data:
md5.update(data)
data = fo.read(chunk_sz)
md5hash = base64.urlsafe_b64encode(md5.digest()).decode('UTF-8').rstrip('=\n')
except IOError:
md5hash = None
return md5hash
size = os.path.getsize(fname)
hash = getMD5Hash(fname)
But, from what I understand, this requires two passes of the file and could be more efficient.
Upvotes: 0
Views: 82
Reputation: 17329
A file does not have to be scanned to get its length. The filesystem knows how big a file is.
If you insist on doing it manually, set size = 0
then do size += len(data)
inside your while
loop.
Of course your getMD5Hash()
is now getMD5Hash_and_size()
.
Upvotes: 3