Reputation: 10775
I tried
filecmp.cmp(file1,file2)
but it doesn't work since files are identically except for new line characters. Is there an option for that in filecmp or some other convenience function/library or do I have to read both files line by line and compare those?
Upvotes: 2
Views: 1735
Reputation: 81
The source code for filecmp.cmp() has this for the comparison part:
BUFSIZE = 8*1024
def _do_cmp(f1, f2):
bufsize = BUFSIZE
with open(f1, 'rb') as fp1, open(f2, 'rb') as fp2:
while True:
b1 = fp1.read(bufsize)
b2 = fp2.read(bufsize)
if b1 != b2:
return False
if not b1:
return True
I modified that to make:
def universal_filecmp(f1, f2):
with open(f1, 'r') as fp1, open(f2, 'r') as fp2:
while True:
b1 = fp1.readline()
b2 = fp2.readline()
if b1 != b2:
return False
if not b1:
return True
For Python 3 opening in read mode automatically converts newlines for you. For older versions you can add 'U' to the mode. I tested this code in a test bench for a package I am working on and it seems to work.
Upvotes: 1
Reputation: 88747
Looks like you just need to check if files are same or not ignoring whitespace/newlines.
You can use a function like this
def do_cmp(f1, f2):
bufsize = 8*1024
fp1 = open(f1, 'rb')
fp2 = open(f2, 'rb')
while True:
b1 = fp1.read(bufsize)
b2 = fp2.read(bufsize)
if not is_same(b1, b2):
return False
if not b1:
return True
def is_same(text1, text2):
return text1.replace("\n","") == text2.replace("\n","")
you can improve is_same
so that it matches according to your requirements e.g. you may ignore case too.
Upvotes: 0
Reputation: 86362
Try the difflib
module - it provides classes and functions for comparing sequences.
For your needs, the difflib.Differ
class looks interesting.
class difflib.Differ
This is a class for comparing sequences of lines of text, and producing human-readable differences or deltas. Differ uses SequenceMatcher both to compare sequences of lines, and to compare sequences of characters within similar (near-matching) lines.
See the differ example, that compares two texts. The sequences being compared can also be obtained from the readlines()
method of file-like objects.
Upvotes: 1
Reputation: 70148
I think a simple convenience function like this should do the job:
from itertools import izip
def areFilesIdentical(filename1, filename2):
with open(filename1, "rtU") as a:
with open(filename2, "rtU") as b:
# Note that "all" and "izip" are lazy
# (will stop at the first line that's not identical)
return all(myprint() and lineA == lineB
for lineA, lineB in izip(a.xreadlines(), b.xreadlines()))
Upvotes: 5