Reputation: 5198
The ndiff
function from difflib
allows a nice interface to detect differences in lines. It does a great job when the lines are close enough:
>>> print '\n'.join(list(ndiff(['foo*'], ['foot'], )))
- foo*
? ^
+ foot
? ^
But when the lines are too dissimilar, the rich reporting is no longer possible:
>>> print '\n'.join(list(ndiff(['foo'], ['foo*****'], )))
- foo
+ foo*****
This is the use case I am hitting, and I am trying to find ways to use ndiff
(or the underlying class Differ
) to force the reporting even if the strings are too dissimilar.
For the failing example, I would like to have a result like:
>>> print '\n'.join(list(ndiff(['foo'], ['foo*****'], )))
- foo
+ foo*****
? +++++
Upvotes: 2
Views: 184
Reputation: 22324
It seems what you want to do here is not to compare across multiple lines, but across strings. You can then pass your strings directly, without a list, and you should get a behaviour close to the one you are looking for.
>>> print ('\n'.join(list(ndiff('foo', 'foo*****'))))
f
o
o
+ *
+ *
+ *
+ *
+ *
Even though the output format is not the exact one you are looking for, it encapsulate the correct information. We can make an output adapter to give the correct format.
def adapter(out):
chars = []
symbols = []
for c in out:
chars.append(c[2])
symbols.append(c[0])
return ''.join(chars), ''.join(symbols)
This can be used like so.
>>> print ('\n'.join(adapter(ndiff('foo', 'foo*****'))))
foo*****
+++++
Upvotes: 0
Reputation: 17781
The function responsible for printing the context (i.e. those lines starting with ?
) is Differ._fancy_replace
. That function works by checking whether the two lines are equal by at least 75% (see the cutoff
variable). Unfortunately, that 75% cutoff is hard-coded and cannot be changed.
What I can suggest is to subclass Differ
and provide a version of _fancy_replace
that simply ignores the cutoff. Here it is:
from difflib import Differ, SequenceMatcher
class FullContextDiffer(Differ):
def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
"""
Copied and adapted from https://github.com/python/cpython/blob/3.6/Lib/difflib.py#L928
"""
best_ratio = 0
cruncher = SequenceMatcher(self.charjunk)
for j in range(blo, bhi):
bj = b[j]
cruncher.set_seq2(bj)
for i in range(alo, ahi):
ai = a[i]
if ai == bj:
continue
cruncher.set_seq1(ai)
if cruncher.real_quick_ratio() > best_ratio and \
cruncher.quick_ratio() > best_ratio and \
cruncher.ratio() > best_ratio:
best_ratio, best_i, best_j = cruncher.ratio(), i, j
yield from self._fancy_helper(a, alo, best_i, b, blo, best_j)
aelt, belt = a[best_i], b[best_j]
atags = btags = ""
cruncher.set_seqs(aelt, belt)
for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes():
la, lb = ai2 - ai1, bj2 - bj1
if tag == 'replace':
atags += '^' * la
btags += '^' * lb
elif tag == 'delete':
atags += '-' * la
elif tag == 'insert':
btags += '+' * lb
elif tag == 'equal':
atags += ' ' * la
btags += ' ' * lb
else:
raise ValueError('unknown tag %r' % (tag,))
yield from self._qformat(aelt, belt, atags, btags)
yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
And here is an example of how it works:
a = [
'foo',
'bar',
'foobar',
]
b = [
'foo',
'bar',
'barfoo',
]
print('\n'.join(FullContextDiffer().compare(a, b)))
# Output:
#
# foo
# bar
# - foobar
# ? ---
#
# + barfoo
# ? +++
Upvotes: 1