Trevor Boyd Smith
Trevor Boyd Smith

Reputation: 19273

Python: How to compare two binary files?

In python I need to print a diff of two binary files. I was looking at difflib.Differ which does a lot.

However differ assumes lines of text and so the output does not list the byte index and the hex value difference.

What I need is output that has what byte is different, how the byte is different, the actual hex values of the two bytes.

In Python, how do you compare two binary files (output: the byte diff index, the hex values of the two bytes)?

I was doing something like:

# /usr/bin/env python2
import difflib
x = open('/path/to/file1', 'r').read()
y = open('/path/to/file2', 'r').read()
print '\n'.join(difflib.Differ().compare(x, y))

But this doesn't output the byte index where the difference is. And it doesn't print the hex values.

Upvotes: 11

Views: 26777

Answers (2)

meh93
meh93

Reputation: 334

When difflib compares it puts every char into an array with a + or - in front of it. Below compares x and y and then we look at the output:

d = difflib.Differ()
e = d.compare(x,y)        #set the compare output to a variable
for i in range(0,len(e)):
    if i.startswith("-"):         #if that char start with "-" is not a match
        print(i + "index is different")

The chars will start with a "-" that don't match. "+" Indicates they are matching.

Upvotes: 0

Trevor Boyd Smith
Trevor Boyd Smith

Reputation: 19273

The shell command cmp already does exactly what I need/want. Reinventing that functionality in Python would be more effort/code/time... so I just called the command from Python:

#!/usr/bin/env python2
import commands
import numpy as np
def run_cmp(filename1, filename2):
    cmd = 'cmp --verbose %s %s'%(filename1, filename2)
    status, output = commands.getstatusoutput(cmd) # python3 deprecated `commands` module FYI
    status = status if status < 255 else status%255
    if status > 1:
        raise RuntimeError('cmp returned with error (exitcode=%s, '
                'cmd=\"%s\", output=\n\"%s\n\")'%(status, cmd, output))
    elif status == 1:
        is_different = True
    elif status == 0:
        is_different = False
    else:
        raise RuntimeError('invalid exitcode detected')
    return is_different, output
if __name__ == '__main__':
    # create two binary files with different values
    # file 1
    tmp1 = np.arange(10, dtype=np.uint8)
    tmp1.tofile('tmp1')
    # file 2
    tmp2 = np.arange(10, dtype=np.uint8)
    tmp2[5] = 0xFF
    tmp2.tofile('tmp2')
    # compare using the shell command 'cmp'
    is_different, output = run_cmp(filename1='tmp1', filename2='tmp2')
    print 'is_different=%s, output=\n\"\n%s\n\"'%(is_different, output)

Upvotes: 0

Related Questions