Reputation: 13
I need to write a script that compares the contents of two text files. I want to give an example for what I want:
file1.txt's content:
New York
Los Angeles
Miami
file2.txt's content:
New York
Orlando
Miami
Dc
I want to compare the two texts and print the different added or missed elements.
My code attempt is here:
from difflib import Differ
from numpy import diff
myfile1 = input("Enter First File's name for compare : ")
myfile2 = input("Enter Second File's name for compare : ")
ch1 = myfile1.split(".")
ch2 = myfile2.split(".")
if ch1[1] == "txt" and ch2[1] == "txt":
with open(myfile1) as file_1, open(myfile2) as file_2:
differ = Differ()
for line in differ.compare(file_1.readlines(), file_2.readlines()):
print(line)
else:
print("File format Eror !")
I already use difflib, but if some content is missed this thing is adding "-" in front of the name, also if content is added "+" is added in front of the name. I need to print added and missed contents.
Upvotes: 0
Views: 9765
Reputation: 35
first read all lines of files
with open('file1.txt') as f1:
a = f1.readlines()
with open('file2.txt') as f2:
b = f2.readlines()
for reading files in python 3.10 or higher
with (
open('file1.txt') as f1,
open('file2.txt') as f2,
):
a = f1.readlines()
b = f2.readlines()
and now for print differences between file a
and b
import difflib
a_sample = a[0] # 'New York Los Angeles Miami'
b_sample = b[0] # 'New York Orlando Miami Dc'
diff = difflib.ndiff(a.replace(' ', '\n').splitlines(keepends=True), b.replace(' ', '\n').splitlines(keepends=True))
print(''.join(diff), end="")
New
York
+ Orlando
- Los
- Angeles
- Miami+ Miami
? +
+ Dc
and iterate all the files:
for file1_line, file2_line in zip(a, b):
diff = difflib.ndiff(
a.replace(' ', '\n').splitlines(keepends=True),
b.replace(' ', '\n').splitlines(keepends=True)
)
print(''.join(diff), end="")
What's the meaning difflib symbols:
code | meaning |
---|---|
'- ' | line unique to sequence 1 |
'+ ' | line unique to sequence 2 |
' ' | line common to both sequences |
'? ' | line not present in either input sequence |
Note: you can iterate in diff output and print only +
or -
words.
python document: https://docs.python.org/3/library/difflib.html
Upvotes: 0
Reputation: 4137
If you want to compare the single characters you can iterate over them:
with open("file1.txt", 'r') as file: # Same thing with file2
content1 = file.read()
...
Like this:
min_len = min(map(len, (content1, content2)))
for i in range(min_len): # use smaller length
if (content1[i] != content2[i]):
# You found a difference between this two characthers
# Do something
# content1 has some extra from content1[min_len:], so you do something with it
If you want to compare the characters in the words you will have to split
the input before:
content1 = file.read().split(' ')
Upvotes: 0