Reputation: 99
I'm playing around with difflib in Python and I'm having some difficulty getting the output to look good. For some strange reason, difflib is adding a single whitespace before each character. For example, I have a file (textfile01.txt) that looks like this:
test text which has no meaning
and textfile02.txt
test text which has no meaning
but looks nice
Here's a small code sample for how I'm trying to accomplish the comparison:
import difflib
handle01 = open(text01.txt , 'r')
handle02 = open(text02.txt , 'r')
d = difflib.ndiff( handle01.read() , handle02.read() )
print "".join(list(diff))
Then, I get this ugly output that looks...very strange:
t e s t t e x t w h i c h h a s n o m e a n i n g-
- b- u- t- - l- o- o- k- s- - n- i- c- e
As you can see, the output looks horrible. I've just been following basic difflib tutorials I found online, and according to those, the output should look completely different. I have no clue what I'm doing wrong. Any ideas?
Upvotes: 4
Views: 2758
Reputation: 11823
difflib.ndiff
compares lists of strings, but you are passing strings to them — and a string is really a list of characters. The function is thus comparing the strings character by character.
>>> list(difflib.ndiff("test", "testa"))
[' t', ' e', ' s', ' t', '+ a']
(Literally, you can go from the list ["t", "e", "s", "t"]
to the list ["t", "e", "s", "t", "a"]
by adding the element ["a"]
there.
You want to change read()
to readlines()
so you can compare the two files in a linewise fashion, which is probably what you were expecting.
You also want to change "".join(...
to "\n".join(...
in order to get a diff
-like output on screen.
>>> list(difflib.ndiff(["test"], ["testa"]))
['- test', '+ testa', '? +\n']
>>> print "\n".join(_)
- test
+ testa
? +
(Here difflib is being extra nice and marking the exact position where the character was added in the ?
line.)
Upvotes: 9