difflib output is very strange, adding extra whitespace on each character

Question

I'm playing around with difflib in Python and I'm having some difficulty getting the output to look good. For some strange reason, difflib is adding a single whitespace before each character. For example, I have a file (textfile01.txt) that looks like this:

test text which has no meaning

and textfile02.txt

test text which has no meaning

but looks nice

Here's a small code sample for how I'm trying to accomplish the comparison:

import difflib

handle01 = open(text01.txt , 'r')
handle02 = open(text02.txt , 'r')

d = difflib.ndiff( handle01.read() , handle02.read() )
print "".join(list(diff))

Then, I get this ugly output that looks...very strange:

t e s t t e x t w h i c h h a s n o m e a n i n g-

- b- u- t- - l- o- o- k- s- - n- i- c- e

As you can see, the output looks horrible. I've just been following basic difflib tutorials I found online, and according to those, the output should look completely different. I have no clue what I'm doing wrong. Any ideas?

badp · Accepted Answer

difflib.ndiff compares lists of strings, but you are passing strings to them — and a string is really a list of characters. The function is thus comparing the strings character by character.

>>> list(difflib.ndiff("test", "testa"))
['  t', '  e', '  s', '  t', '+ a']

(Literally, you can go from the list ["t", "e", "s", "t"] to the list ["t", "e", "s", "t", "a"] by adding the element ["a"] there.

You want to change read() to readlines() so you can compare the two files in a linewise fashion, which is probably what you were expecting.

You also want to change "".join(... to " ".join(... in order to get a diff-like output on screen.

>>> list(difflib.ndiff(["test"], ["testa"]))
['- test', '+ testa', '?     +
']
>>> print "
".join(_)
- test
+ testa
?     +

(Here difflib is being extra nice and marking the exact position where the character was added in the ? line.)

difflib output is very strange, adding extra whitespace on each character

Answers (1)

Related Questions