Reputation: 1032
I have 2 files called "hosts" (in different directories)
I want to compare them using python to see if they are IDENTICAL. If they are not Identical, I want to print the difference on the screen.
So far I have tried this
hosts0 = open(dst1 + "/hosts","r")
hosts1 = open(dst2 + "/hosts","r")
lines1 = hosts0.readlines()
for i,lines2 in enumerate(hosts1):
if lines2 != lines1[i]:
print "line ", i, " in hosts1 is different \n"
print lines2
else:
print "same"
But when I run this, I get
File "./audit.py", line 34, in <module>
if lines2 != lines1[i]:
IndexError: list index out of range
Which means one of the hosts has more lines than the other. Is there a better method to compare 2 files and report the difference?
Upvotes: 64
Views: 163570
Reputation: 419
hosts0 = open("C:path\\a.txt","r")
hosts1 = open("C:path\\b.txt","r")
lines1 = hosts0.readlines()
for i,lines2 in enumerate(hosts1):
if lines2 != lines1[i]:
print "line ", i, " in hosts1 is different \n"
print lines2
else:
print "same"
The above code is working for me. Can you please indicate what error you are facing?
Upvotes: 2
Reputation: 11
import difflib
f=open('a.txt','r') #open a file
f1=open('b.txt','r') #open another file to compare
str1=f.read()
str2=f1.read()
str1=str1.split() #split the words in file by default through the spce
str2=str2.split()
d=difflib.Differ() # compare and just print
diff=list(d.compare(str2,str1))
print '\n'.join(diff)
Upvotes: 1
Reputation: 1345
import difflib
lines1 = '''
dog
cat
bird
buffalo
gophers
hound
horse
'''.strip().splitlines()
lines2 = '''
cat
dog
bird
buffalo
gopher
horse
mouse
'''.strip().splitlines()
# Changes:
# swapped positions of cat and dog
# changed gophers to gopher
# removed hound
# added mouse
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm=''):
print line
Outputs the following:
--- file1
+++ file2
@@ -1,7 +1,7 @@
+cat
dog
-cat
bird
buffalo
-gophers
-hound
+gopher
horse
+mouse
This diff gives you context -- surrounding lines to help make it clear how the file is different. You can see "cat" here twice, because it was removed from below "dog" and added above it.
You can use n=0 to remove the context.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
print line
Outputting this:
--- file1
+++ file2
@@ -0,0 +1 @@
+cat
@@ -2 +2,0 @@
-cat
@@ -5,2 +5 @@
-gophers
-hound
+gopher
@@ -7,0 +7 @@
+mouse
But now it's full of the "@@" lines telling you the position in the file that has changed. Let's remove the extra lines to make it more readable.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
for prefix in ('---', '+++', '@@'):
if line.startswith(prefix):
break
else:
print line
Giving us this output:
+cat
-cat
-gophers
-hound
+gopher
+mouse
Now what do you want it to do? If you ignore all removed lines, then you won't see that "hound" was removed. If you're happy just showing the additions to the file, then you could do this:
diff = difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0)
lines = list(diff)[2:]
added = [line[1:] for line in lines if line[0] == '+']
removed = [line[1:] for line in lines if line[0] == '-']
print 'additions:'
for line in added:
print line
print
print 'additions, ignoring position'
for line in added:
if line not in removed:
print line
Outputting:
additions:
cat
gopher
mouse
additions, ignoring position:
gopher
mouse
You can probably tell by now that there are various ways to "print the differences" of two files, so you will need to be very specific if you want more help.
Upvotes: 102
Reputation:
You can add an conditional statement. If your array goes beyond index, then break and print the rest of the file.
Upvotes: 1
Reputation: 1345
The difflib library is useful for this, and comes in the standard library. I like the unified diff format.
http://docs.python.org/2/library/difflib.html#difflib.unified_diff
import difflib
import sys
with open('/tmp/hosts0', 'r') as hosts0:
with open('/tmp/hosts1', 'r') as hosts1:
diff = difflib.unified_diff(
hosts0.readlines(),
hosts1.readlines(),
fromfile='hosts0',
tofile='hosts1',
)
for line in diff:
sys.stdout.write(line)
Outputs:
--- hosts0
+++ hosts1
@@ -1,5 +1,4 @@
one
two
-dogs
three
And here is a dodgy version that ignores certain lines. There might be edge cases that don't work, and there are surely better ways to do this, but maybe it will be good enough for your purposes.
import difflib
import sys
with open('/tmp/hosts0', 'r') as hosts0:
with open('/tmp/hosts1', 'r') as hosts1:
diff = difflib.unified_diff(
hosts0.readlines(),
hosts1.readlines(),
fromfile='hosts0',
tofile='hosts1',
n=0,
)
for line in diff:
for prefix in ('---', '+++', '@@'):
if line.startswith(prefix):
break
else:
sys.stdout.write(line[1:])
Upvotes: 20