Reputation: 894
I am trying to output the difference between two text files using the library difflib in Python 2, with the function HtmlDiff to generate an html file.
V1 = 'This has four words'
V2 = 'This has more than four words'
res = difflib.HtmlDiff().make_table(V1, V2)
text_file = open(OUTPUT, "w")
text_file.write(res)
text_file.close()
However the output html looks like this on a browser:
The display is comparing each single character, making it completely unreadable.
What should I modify for the comparison to be more human-friendly? (e.g. full sentences on each side)
If the input specifies "lines", then the output is also formatted respecting the lines, but it is not displaying the differences:
V1 = ['This has four words']
V2 = ['This has more than four words']
res = difflib.HtmlDiff().make_table(V1, V2)
text_file = open(OUTPUT, "w")
text_file.write(res)
text_file.close()
Resulting html (as viewed on a browser):
Upvotes: 2
Views: 4592
Reputation: 1
this is an old question, but i have been struggling with it myself for a few days. I was getting this:
before fixing anything i finally pieced together something. looks like this:
html = difflib.HtmlDiff().make_file(a.split(' '), b.split(' '), fromdesc="original", todesc="modified")
after adding simple little split
Upvotes: 0
Reputation: 71
The problem is you don't have the required styles. Try using make_file instead of make_table, then you'll see there is some CSS that will make the colors show up as you're expecting.
Upvotes: 1
Reputation: 6826
To get a markup you can use difflib.SequenceMatcher
as in the function defined in this answer https://stackoverflow.com/a/788780/2318649
to get this code:
import difflib
def show_diff(seqm):
# function from https://stackoverflow.com/questions/774316/python-difflib-highlighting-differences-inline
"""Unify operations between two compared strings
seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
output= []
for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
if opcode == 'equal':
output.append(seqm.a[a0:a1])
elif opcode == 'insert':
output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
elif opcode == 'delete':
output.append("<del>" + seqm.a[a0:a1] + "</del>")
elif opcode == 'replace':
raise NotImplementedError( "what to do with 'replace' opcode?" )
else:
raise RuntimeError( f"unexpected opcode unknown opcode {opcode}" )
return ''.join(output)
V1 = 'This has four words but fewer than eleven'
V2 = 'This has more than four words'
sm= difflib.SequenceMatcher(None, V1, V2)
html = "<html><body>"+show_diff(sm)+"</body></html>"
open("output.html","wt").write(html)
which produces:
Upvotes: 1