Reputation:
I have 2 text files. I want to compare the 2 text files and return a list that has every line number that is different. Right now, I think my code returns the lines that are different, but how do I return the line number instead?
def diff(filename1, filename2):
with open('./exercise-files/text_a.txt', 'r') as filename1:
with open('./exercise-files/text_b.txt', 'r') as filename2:
difference = set(filename1).difference(filename2)
difference.discard('\n')
with open('diff.txt', 'w') as file_out:
for line in difference:
file_out.write(line)
Testing on:
diff('./exercise-files/text_a.txt', './exercise-files/text_b.txt') == [3, 4, 6]
diff('./exercise-files/text_a.txt', './exercise-files/text_a.txt') == []
Upvotes: 1
Views: 137
Reputation: 182083
difference = [
line_number + 1 for line_number, (line1, line2)
in enumerate(zip(filename1, filename2))
if line1 != line2
]
zip
takes two (or more) generators and returns a generator of tuples, where each tuple contains the corresponding entries of each generator. enumerate
takes this generator and returns a generator of tuples, where the first element is the index and the second the value from the original generator. And it's straightforward from there.
Upvotes: 1
Reputation: 13079
Here is an example which will ignore any surplus lines if one file has more lines than the other. The key is to use enumerate
when iterating to get the line number as well as the contents. next
can be used to get a line from the file iterator which is not used directly by the for
loop.
def diff(filename1, filename2):
difference_line_numbers = []
with open(filename1, "r") as file1, open(filename2, "r") as file2:
for line_number, contents1 in enumerate(file1, 1):
try:
contents2 = next(file2)
except StopIteration:
break
if contents1 != contents2:
difference_line_numbers.append(line_number)
return difference_line_numbers
Upvotes: 0