edesz
edesz

Reputation: 12406

Python difflib for 2 files, with incorrect line numbers

I have to compare 2 files in Python and I am using difflib. I tried this separately with ndiff and then with unified_diff. The contents of the 2 files are straightforward, per the example here:

File_1.txt:

User1 US
User2 US
User3 US

File_2.txt:

User1 US
User2 US
User3 NG

Here is the code that works for me (WITHOUT correct line numbers):

import difflib
import sys

def dif_wr(d):
    for i,line in enumerate(d):
        sys.stdout.write('{} {}' .format(i+1,line))

# Method 1
with open('File_1.txt', 'r') as h0:
    with open('File_2.txt', 'r') as h1:
        dif = difflib.unified_diff(h0.readlines(),\
                                   h1.readlines(),\
                                   fromfile='File_1.txt',tofile='File_2.txt')
dif_wr(dif)

# Method 2
with open('File_1.txt','r') as fl1, open('File_2.txt','r') as fl2:
    dif2 = difflib.ndiff(fl1.readlines(),fl2.readlines())
dif_wr(dif2)

The output is:

1 --- File_1.txt
2 +++ File_2.txt
3 @@ -1,3 +1,3 @@
4  User1 US
5  User2 US
6 -User3 US7 +User3 NG1   User1 US
2   User2 US
3 - User3 US4 ?       ^^
5 + User3 NG6 ?       ^^

The line numbers seem to be incorrect. It seems that they are starting at line number 4, which is the wrong line.

Question

Is there a way to get the correct line numbers with the output?

Upvotes: 1

Views: 2115

Answers (1)

Open AI - Opting Out
Open AI - Opting Out

Reputation: 24164

The documentation for unified_diff says it takes a parameter n:

Unified diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in an inline style (instead of separate before/after blocks). The number of context lines is set by n which defaults to three. The number of context lines is set by n which defaults to three.

Also, parameter lineterm:

For inputs that do not have trailing newlines, set the lineterm argument to "" so that the output will be uniformly newline free.

To get the output you want, you need to take the line terminators off, and add them back on in the output. You also need to set context lines to zero:

import difflib
import sys    

def dif_wr(d):
    for i, line in enumerate(d):
        sys.stdout.write('{} {}\n'.format(i + 1, line))

Some example code using strings instead of files:

from StringIO import StringIO

file1 = """User1 US
User2 US
User3 US"""

file2 = """User1 US
User2 US
User3 NG"""

dif2 = difflib.unified_diff(StringIO(file1).readlines(),
                            StringIO(file2).readlines(),
                            fromfile='File_1.txt',
                            tofile='File_2.txt',
                            n=0,
                            lineterm="")
dif_wr(dif2)

Output:

1 --- File_1.txt
2 +++ File_2.txt
3 @@ -3,1 +3,1 @@
4 -User3 US
5 +User3 NG

Upvotes: 2

Related Questions