Reputation: 169
I'm using the following code I found on stackoverflow which suggested is an effective way to get the md5 hash of the contents of a text file and comparing with the generated md5 hash I got from http://www.miraclesalad.com/webtools/md5.php
However.. it isn't returned the same md5 hash and I'm not sure where I've gone wrong. The file contents is an exact match of the text I used to generate the md5 hash so it should match but it's not returning that same match.
Basically, I wanted to generate a md5 hash of some text and compare it with the contents of a text file to see if it matches.
def md5Checksum(filePath):
with open(filePath, 'rb') as fh:
m = hashlib.md5()
while True:
data = fh.read(8192)
if not data:
break
m.update(data)
return m.hexdigest()
If I create a text file with the contents "test" and also go to http://www.miraclesalad.com/webtools/md5.php and type in "test" and generate a hash then compare both they are both different.
The hash I'm getting back is always the same no matter the contents of the file.
code to compare hash
filetext = 'LOCATIONTOFILE.txt'
filemd5 = '098f6bcd4621d373cade4e832627b4f6'
if not filemd5 == md5Checksum(filetxt):
I've tried printing the data and both data are exactly the same too.
hash of test
from website: 098f6bcd4621d373cade4e832627b4f6
hash of text file with the contents test
d41d8cd98f00b204e9800998ecf8427e
UPDATE
Fixed the issue thanks to Adam Smith.
It was a ident typo and so wasn't returning the updated hashlib.
Upvotes: 1
Views: 7458
Reputation: 2723
With only the text of test
, (no blank line after) in both the web generator and Python I get the MD5 hash of:
098f6bcd4621d373cade4e832627b4f6
If I add a carriage return / new line (\n) afterwards I get:
d8e8fca2dc0f896fd7cb4cb0031ba249 # Using the web site
9f06243abcb89c70e0c331c61d871fa7 # Using a Windows machine
d8e8fca2dc0f896fd7cb4cb0031ba249 # Using a Linux machine
The difference is caused by type of carriage return / line feed. DOS/Windows ('\r\n')
-- Linux ('\n')
http://www.cs.toronto.edu/~krueger/csc209h/tut/line-endings.html
Upvotes: 2
Reputation: 54163
On windows, I did the following to reproduce.
C:\Users\adsmith\tmp>echo test>test.txt
Then in Python:
>>> import hashlib
>>> a = hashlib.md5()
>>> b = hashlib.md5()
>>> with open("test.txt", "rb") as fh:
... data = fh.read()
... a.update(data)
...
>>> with open("test.txt", "rb") as fh:
... data = fh.read().strip()
... b.update(data)
...
>>> print(a.hexdigest(), "\n", b.hexdigest())
'9f06243abcb89c70e0c331c61d871fa7' # from b'test\r\n'
'098f6bcd4621d373cade4e832627b4f6' # from b'test'
The issue is clearly caused by the line terminator in your file. This should also be a warning not to use lower-level constructs like file.read(bytecount)
unless you have to!
>>> open("test.txt", 'rb').read()
# b'test\r\n'
Upvotes: 3
Reputation: 31
Are you sure that your size param is large enough (I can't imagine it wouldn't be, but worth checking)? When I test your code above with a simple value and compare with a standard MD5 hash (using miraclesalad or whatever), I get back a correct response. Carriage returns or special characters could be of some concern too.
Upvotes: 1
Reputation: 3325
The issue could be with newlines. If your file ends in a newline "test\n"
, the MD5 hash would be d8e8fca2dc0f896fd7cb4cb0031ba249
.
Line endings can also differ whether you are on a Windows or Unix system.
Upvotes: 2