Reputation: 57973
Given two text files A,B, what is an easy way to get the line numbers of lines in B not present in A? I see there's difflib, but don't see an interface for retrieving line numbers
Upvotes: 6
Views: 4041
Reputation: 36564
difflib can give you what you need. Assume:
a.txt
this
is
a
bunch
of
lines
b.txt
this
is
a
different
bunch
of
other
lines
code like this:
import difflib
fileA = open("a.txt", "rt").readlines()
fileB = open("b.txt", "rt").readlines()
d = difflib.Differ()
diffs = d.compare(fileA, fileB)
lineNum = 0
for line in diffs:
# split off the code
code = line[:2]
# if the line is in both files or just b, increment the line number.
if code in (" ", "+ "):
lineNum += 1
# if this line is only in b, print the line number and the text on the line
if code == "+ ":
print "%d: %s" % (lineNum, line[2:].strip())
gives output like:
bgporter@varese ~/temp:python diffy.py
4: different
7: other
You'll also want to look at the difflib code "? "
and see how you want to handle that one.
(also, in real code you'd want to use context managers to make sure the files get closed, etc etc etc)
Upvotes: 12
Reputation: 22669
A poor man's solution:
with open('A.txt') as f:
linesA = f.readlines()
with open('B.txt') as f:
linesB = f.readlines()
print [k for k, v in enumerate(linesB) if not v in linesA]
Upvotes: 0