Ranga Sarin
Ranga Sarin

Reputation: 169

Python compare md5 hash

I'm using the following code I found on stackoverflow which suggested is an effective way to get the md5 hash of the contents of a text file and comparing with the generated md5 hash I got from http://www.miraclesalad.com/webtools/md5.php

However.. it isn't returned the same md5 hash and I'm not sure where I've gone wrong. The file contents is an exact match of the text I used to generate the md5 hash so it should match but it's not returning that same match.

Basically, I wanted to generate a md5 hash of some text and compare it with the contents of a text file to see if it matches.

def md5Checksum(filePath):
    with open(filePath, 'rb') as fh:
        m = hashlib.md5()
        while True:
            data = fh.read(8192)
            if not data:
                break
            m.update(data)
        return m.hexdigest()

If I create a text file with the contents "test" and also go to http://www.miraclesalad.com/webtools/md5.php and type in "test" and generate a hash then compare both they are both different.

The hash I'm getting back is always the same no matter the contents of the file.

code to compare hash

filetext = 'LOCATIONTOFILE.txt'
filemd5 = '098f6bcd4621d373cade4e832627b4f6'
if not filemd5 == md5Checksum(filetxt):

I've tried printing the data and both data are exactly the same too.

hash of test from website: 098f6bcd4621d373cade4e832627b4f6

hash of text file with the contents test d41d8cd98f00b204e9800998ecf8427e

UPDATE

Fixed the issue thanks to Adam Smith.

It was a ident typo and so wasn't returning the updated hashlib.

Upvotes: 1

Views: 7458

Answers (4)

ode2k
ode2k

Reputation: 2723

With only the text of test, (no blank line after) in both the web generator and Python I get the MD5 hash of:

098f6bcd4621d373cade4e832627b4f6

If I add a carriage return / new line (\n) afterwards I get:

d8e8fca2dc0f896fd7cb4cb0031ba249 # Using the web site

9f06243abcb89c70e0c331c61d871fa7 # Using a Windows machine

d8e8fca2dc0f896fd7cb4cb0031ba249 # Using a Linux machine

The difference is caused by type of carriage return / line feed. DOS/Windows ('\r\n') -- Linux ('\n')

http://www.cs.toronto.edu/~krueger/csc209h/tut/line-endings.html

Upvotes: 2

Adam Smith
Adam Smith

Reputation: 54163

On windows, I did the following to reproduce.

C:\Users\adsmith\tmp>echo test>test.txt

Then in Python:

>>> import hashlib
>>> a = hashlib.md5()
>>> b = hashlib.md5()
>>> with open("test.txt", "rb") as fh:
...     data = fh.read()
...     a.update(data)
...
>>> with open("test.txt", "rb") as fh:
...     data = fh.read().strip()
...     b.update(data)
...
>>> print(a.hexdigest(), "\n", b.hexdigest())
'9f06243abcb89c70e0c331c61d871fa7'  # from b'test\r\n'
'098f6bcd4621d373cade4e832627b4f6'  # from b'test'

The issue is clearly caused by the line terminator in your file. This should also be a warning not to use lower-level constructs like file.read(bytecount) unless you have to!

>>> open("test.txt", 'rb').read()
# b'test\r\n'

Upvotes: 3

S. Rhea
S. Rhea

Reputation: 31

Are you sure that your size param is large enough (I can't imagine it wouldn't be, but worth checking)? When I test your code above with a simple value and compare with a standard MD5 hash (using miraclesalad or whatever), I get back a correct response. Carriage returns or special characters could be of some concern too.

Upvotes: 1

jaynp
jaynp

Reputation: 3325

The issue could be with newlines. If your file ends in a newline "test\n", the MD5 hash would be d8e8fca2dc0f896fd7cb4cb0031ba249.

Line endings can also differ whether you are on a Windows or Unix system.

Upvotes: 2

Related Questions