Reputation: 19273
In python I need to print a diff of two binary files. I was looking at difflib.Differ
which does a lot.
However differ assumes lines of text and so the output does not list the byte index and the hex value difference.
What I need is output that has what byte is different, how the byte is different, the actual hex values of the two bytes.
In Python, how do you compare two binary files (output: the byte diff index, the hex values of the two bytes)?
I was doing something like:
# /usr/bin/env python2
import difflib
x = open('/path/to/file1', 'r').read()
y = open('/path/to/file2', 'r').read()
print '\n'.join(difflib.Differ().compare(x, y))
But this doesn't output the byte index where the difference is. And it doesn't print the hex values.
Upvotes: 11
Views: 26777
Reputation: 334
When difflib compares it puts every char into an array with a + or - in front of it. Below compares x and y and then we look at the output:
d = difflib.Differ()
e = d.compare(x,y) #set the compare output to a variable
for i in range(0,len(e)):
if i.startswith("-"): #if that char start with "-" is not a match
print(i + "index is different")
The chars will start with a "-" that don't match. "+" Indicates they are matching.
Upvotes: 0
Reputation: 19273
The shell command cmp
already does exactly what I need/want. Reinventing that functionality in Python would be more effort/code/time... so I just called the command from Python:
#!/usr/bin/env python2
import commands
import numpy as np
def run_cmp(filename1, filename2):
cmd = 'cmp --verbose %s %s'%(filename1, filename2)
status, output = commands.getstatusoutput(cmd) # python3 deprecated `commands` module FYI
status = status if status < 255 else status%255
if status > 1:
raise RuntimeError('cmp returned with error (exitcode=%s, '
'cmd=\"%s\", output=\n\"%s\n\")'%(status, cmd, output))
elif status == 1:
is_different = True
elif status == 0:
is_different = False
else:
raise RuntimeError('invalid exitcode detected')
return is_different, output
if __name__ == '__main__':
# create two binary files with different values
# file 1
tmp1 = np.arange(10, dtype=np.uint8)
tmp1.tofile('tmp1')
# file 2
tmp2 = np.arange(10, dtype=np.uint8)
tmp2[5] = 0xFF
tmp2.tofile('tmp2')
# compare using the shell command 'cmp'
is_different, output = run_cmp(filename1='tmp1', filename2='tmp2')
print 'is_different=%s, output=\n\"\n%s\n\"'%(is_different, output)
Upvotes: 0