Ayaz ARMUTLU
Ayaz ARMUTLU

Reputation: 13

How can I compare the contents of two text files in Python?

I need to write a script that compares the contents of two text files. I want to give an example for what I want:

file1.txt's content:

New York
Los Angeles
Miami

file2.txt's content:

New York
Orlando
Miami
Dc

I want to compare the two texts and print the different added or missed elements.

My code attempt is here:

from difflib import Differ

from numpy import diff

myfile1 = input("Enter First File's name for compare : ")
myfile2 = input("Enter Second File's name for compare : ")

ch1 = myfile1.split(".")
ch2 = myfile2.split(".")

if ch1[1] == "txt" and ch2[1] == "txt":
    with open(myfile1) as file_1, open(myfile2) as file_2:
        differ = Differ()

        for line in differ.compare(file_1.readlines(), file_2.readlines()):
            print(line)
    
else:
    print("File format Eror !")

I already use difflib, but if some content is missed this thing is adding "-" in front of the name, also if content is added "+" is added in front of the name. I need to print added and missed contents.

Upvotes: 0

Views: 9765

Answers (2)

Omid
Omid

Reputation: 35

first read all lines of files

with open('file1.txt') as f1:
    a = f1.readlines()
with open('file2.txt') as f2:
    b = f2.readlines()

for reading files in python 3.10 or higher

with (
    open('file1.txt') as f1,
    open('file2.txt') as f2,
):
    a = f1.readlines()
    b = f2.readlines()

and now for print differences between file a and b

import difflib
a_sample = a[0] # 'New York Los Angeles Miami'
b_sample = b[0] # 'New York Orlando Miami Dc'
diff = difflib.ndiff(a.replace(' ', '\n').splitlines(keepends=True), b.replace(' ', '\n').splitlines(keepends=True))
print(''.join(diff), end="")
  New
  York
+ Orlando
- Los
- Angeles
- Miami+ Miami
?      +
+ Dc

and iterate all the files:

for file1_line, file2_line in zip(a, b):
    diff = difflib.ndiff(
                  a.replace(' ', '\n').splitlines(keepends=True), 
                  b.replace(' ', '\n').splitlines(keepends=True)
           )
    print(''.join(diff), end="")

What's the meaning difflib symbols:

code meaning
'- ' line unique to sequence 1
'+ ' line unique to sequence 2
' ' line common to both sequences
'? ' line not present in either input sequence

Note: you can iterate in diff output and print only + or - words.

python document: https://docs.python.org/3/library/difflib.html

Upvotes: 0

FLAK-ZOSO
FLAK-ZOSO

Reputation: 4137

If you want to compare the single characters you can iterate over them:

with open("file1.txt", 'r') as file: # Same thing with file2
    content1 = file.read()
...

Like this:

min_len = min(map(len, (content1, content2)))
for i in range(min_len): # use smaller length
    if (content1[i] != content2[i]):
        # You found a difference between this two characthers
        # Do something
    # content1 has some extra from content1[min_len:], so you do something with it

If you want to compare the characters in the words you will have to split the input before:

content1 = file.read().split(' ')

Upvotes: 0

Related Questions