Mawg
Mawg

Reputation: 40140

Python hashes don't match

I am using Python to generate a C++ header file. It is security classified, so I can't post it here.

I generate it based on certain inputs and, if those don't change, the same file should be generated.

Because it is a header file which is #included almost everywhere, touching it causes a full build. So, if there is no change, I do not want to generate the file.

The simplest approach seemed to be to generate the file in /tmp then take an MD5 hash of the existing file, to see if it needs to be updated.

existingFileMd5 = hashlib.md5(open(headerFilePath,  'rb').read())
newFileMd5 = hashlib.md5(open(tempFilePath,  'rb').read())
if newFileMd5 == existingFileMd5:
    print('Info:    file "' + headerFilePath + '" unchanged, so not updated')
    os.remove(tempFilePath)
else:
    shutil.move(tempFilePath, headerFilePath)
    print('Info:    file "' + headerFilePath + '" updated')

However, when I run the script twice in quick succession (without changing the inputs), it seems to always think that the MD5 hashes are different and updates the file, thus reducing build time.

There are no variable parts to the file, other than those governed by the input. E.g, I am not writing a timestamp.

I have had colleagues eyeball the two files and declare them to be identical (they are quite small). They are also declared to be identical by Linux's meld file compare utility.

So, the problem would seem to be with the code posted above. What am I doing wrong?

Upvotes: 1

Views: 545

Answers (1)

user2357112
user2357112

Reputation: 280181

You forgot to actually ask for the hashes. You're comparing two md5-hasher-thingies, not the hashes.

Call digest to get the hash as a bytes object, or hexdigest to get a string with a hex encoding of the hash:

if newFileMd5.digest() == existingFileMd5.digest():
    ...

Upvotes: 4

Related Questions