Reputation: 12406
I have to compare 2 files in Python and I am using difflib. I tried this separately with ndiff and then with unified_diff. The contents of the 2 files are straightforward, per the example here:
File_1.txt:
User1 US
User2 US
User3 US
File_2.txt:
User1 US
User2 US
User3 NG
Here is the code that works for me (WITHOUT correct line numbers):
import difflib
import sys
def dif_wr(d):
for i,line in enumerate(d):
sys.stdout.write('{} {}' .format(i+1,line))
# Method 1
with open('File_1.txt', 'r') as h0:
with open('File_2.txt', 'r') as h1:
dif = difflib.unified_diff(h0.readlines(),\
h1.readlines(),\
fromfile='File_1.txt',tofile='File_2.txt')
dif_wr(dif)
# Method 2
with open('File_1.txt','r') as fl1, open('File_2.txt','r') as fl2:
dif2 = difflib.ndiff(fl1.readlines(),fl2.readlines())
dif_wr(dif2)
The output is:
1 --- File_1.txt
2 +++ File_2.txt
3 @@ -1,3 +1,3 @@
4 User1 US
5 User2 US
6 -User3 US7 +User3 NG1 User1 US
2 User2 US
3 - User3 US4 ? ^^
5 + User3 NG6 ? ^^
The line numbers seem to be incorrect. It seems that they are starting at line number 4, which is the wrong line.
Question
Is there a way to get the correct line numbers with the output?
Upvotes: 1
Views: 2115
Reputation: 24164
The documentation for unified_diff
says it takes a parameter n
:
Unified diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in an inline style (instead of separate before/after blocks). The number of context lines is set by n which defaults to three. The number of context lines is set by n which defaults to three.
Also, parameter lineterm
:
For inputs that do not have trailing newlines, set the
lineterm
argument to""
so that the output will be uniformly newline free.
To get the output you want, you need to take the line terminators off, and add them back on in the output. You also need to set context lines to zero:
import difflib
import sys
def dif_wr(d):
for i, line in enumerate(d):
sys.stdout.write('{} {}\n'.format(i + 1, line))
Some example code using strings instead of files:
from StringIO import StringIO
file1 = """User1 US
User2 US
User3 US"""
file2 = """User1 US
User2 US
User3 NG"""
dif2 = difflib.unified_diff(StringIO(file1).readlines(),
StringIO(file2).readlines(),
fromfile='File_1.txt',
tofile='File_2.txt',
n=0,
lineterm="")
dif_wr(dif2)
Output:
1 --- File_1.txt
2 +++ File_2.txt
3 @@ -3,1 +3,1 @@
4 -User3 US
5 +User3 NG
Upvotes: 2