Reputation: 3894
I am having two text files which I want to compare using Python. Both of these files are having Date in their header. So, i want to ignore this line while comparison as it will always vary and should not be treated as difference.
File1
Date : 04/29/2013
Some Text
More Text
....
File2
Date : 04/28/2013
Some Text
More Text
....
I have tried comparing them using filecmp
module, but that doesn't supports any argument to ignore any pattern. Is there any other module which can be used for this purpose. I tried using difflib
but was not successfull. Moreover, I just want whether there is difference b/w files as True or False
, difflib
was printing all the lines even if there was no difference using whitespace
.
Upvotes: 2
Views: 4404
Reputation: 133544
Use itertools.ifilter
(or builtin filter
in Python 3)
itertools.ifilter(predicate, iterable)
Your predicate should be a function, returning False
for lines you want to ignore.
eg.
def predicate(line):
if 'something' in line:
return False # ignore it
return True
Then use it on your file object. fin = ifilter(predicate, fin)
Then just use something like
from itertools import izip, ifilter # on Python 3 instead use builtin zip and filter
f1 = ifilter(predicate, f1)
f2 = ifilter(predicate, f2)
all(x == y for x, y in izip(f1, f2))
You don't need difflib
unless you want to see what the differences were, and since you have tried filecmp
I assume you only want to know whether there were difference or not. Unfortunately, filecmp
only works with the filenames.
Also for skipping the first line of each file just use itertools.islice(fin, 1, None)
from itertools import islice, izip
def predicate(line):
''' you can add other general checks in here '''
if line.startswith('Date'):
return False # ignore it
return True
with open('File1.txt') as f1, open('File2.txt') as f2:
f1 = ifilter(predicate, f1)
f2 = ifilter(predicate, f2)
print(all(x == y for x, y in izip(f1, f2)))
>>> True
Upvotes: 6
Reputation: 15357
If you know this date is always on the first line and you copy the lines in a list of strings you just can remove the first line by writing lines[1:]
Added after comment:
Probably it's best to use ifilter in the other solution. If the files are different you have to iterate through them (using two indices, one for each file) and skip lines that contain one of the keywords.
Upvotes: 0