Reputation: 40140
I am using Python to generate a C++ header file. It is security classified, so I can't post it here.
I generate it based on certain inputs and, if those don't change, the same file should be generated.
Because it is a header file which is #included almost everywhere, touching it causes a full build. So, if there is no change, I do not want to generate the file.
The simplest approach seemed to be to generate the file in /tmp
then take an MD5 hash of the existing file, to see if it needs to be updated.
existingFileMd5 = hashlib.md5(open(headerFilePath, 'rb').read())
newFileMd5 = hashlib.md5(open(tempFilePath, 'rb').read())
if newFileMd5 == existingFileMd5:
print('Info: file "' + headerFilePath + '" unchanged, so not updated')
os.remove(tempFilePath)
else:
shutil.move(tempFilePath, headerFilePath)
print('Info: file "' + headerFilePath + '" updated')
However, when I run the script twice in quick succession (without changing the inputs), it seems to always think that the MD5 hashes are different and updates the file, thus reducing build time.
There are no variable parts to the file, other than those governed by the input. E.g, I am not writing a timestamp.
I have had colleagues eyeball the two files and declare them to be identical (they are quite small). They are also declared to be identical by Linux's meld
file compare utility.
So, the problem would seem to be with the code posted above. What am I doing wrong?
Upvotes: 1
Views: 545
Reputation: 280181
You forgot to actually ask for the hashes. You're comparing two md5-hasher-thingies, not the hashes.
Call digest
to get the hash as a bytes
object, or hexdigest
to get a string with a hex encoding of the hash:
if newFileMd5.digest() == existingFileMd5.digest():
...
Upvotes: 4