user14329351
user14329351

Reputation:

Comparing 2 text files in python

I have 2 text files. I want to compare the 2 text files and return a list that has every line number that is different. Right now, I think my code returns the lines that are different, but how do I return the line number instead?

def diff(filename1, filename2):
    with open('./exercise-files/text_a.txt', 'r') as filename1:
        with open('./exercise-files/text_b.txt', 'r') as filename2:
            difference = set(filename1).difference(filename2)

    difference.discard('\n')

    with open('diff.txt', 'w') as file_out:
        for line in difference:
            file_out.write(line)

Testing on:

diff('./exercise-files/text_a.txt', './exercise-files/text_b.txt') == [3, 4, 6]
diff('./exercise-files/text_a.txt', './exercise-files/text_a.txt') == []

Upvotes: 1

Views: 137

Answers (2)

Thomas
Thomas

Reputation: 182083

difference = [
    line_number + 1 for line_number, (line1, line2)
    in enumerate(zip(filename1, filename2))
    if line1 != line2
]

zip takes two (or more) generators and returns a generator of tuples, where each tuple contains the corresponding entries of each generator. enumerate takes this generator and returns a generator of tuples, where the first element is the index and the second the value from the original generator. And it's straightforward from there.

Upvotes: 1

alani
alani

Reputation: 13079

Here is an example which will ignore any surplus lines if one file has more lines than the other. The key is to use enumerate when iterating to get the line number as well as the contents. next can be used to get a line from the file iterator which is not used directly by the for loop.

def diff(filename1, filename2):
    difference_line_numbers = []
    with open(filename1, "r") as file1, open(filename2, "r") as file2:
        for line_number, contents1 in enumerate(file1, 1):
            try:
                contents2 = next(file2)
            except StopIteration:
                break
            if contents1 != contents2:
                difference_line_numbers.append(line_number)
    return difference_line_numbers

Upvotes: 0

Related Questions