Reputation: 3715
I'm using the following code to get a MD5 hash for several files with an approx. total size of 1GB:
md5 = hashlib.md5()
with open(filename,'rb') as f:
for chunk in iter(lambda: f.read(128*md5.block_size), b''):
md5.update(chunk)
fileHash = md5.hexdigest()
For me, it's getting it pretty fast as it takes about 3 seconds to complete. But unfortunately for my users (having an old PC's), this method is very slow and from my observations it may take about 4 minutes for some user to get all of the file hashes. This is a very annoying process for them, but at the same I think this is the simplest & fastest way possible - am I right?
Would it be possible to speed-up the hash collecting process somehow?
Upvotes: 2
Views: 4938
Reputation: 28596
I have a fairly weak laptop as well, and I just tried it - I can md5
one GB in four seconds as well. To go to several minutes, I suspect it's not the calculation but reading the file from hard disk. Try reading 1 MB blocks, i.e., f.read(2**20)
. That should need far fewer reads and increase the overall reading speed.
Upvotes: 3