VitalyB
VitalyB

Reputation: 12855

Using 'diff' (or anything else) to get character-level diff between text files

I'd like to use 'diff' to get a both line difference between and character difference. For example, consider:

File 1

abcde
abc
abcccd

File 2

abcde
ab
abccc

Using diff -u I get:

@@ -1,3 +1,3 @@
 abcde
-abc
-abcccd
\ No newline at end of file
+ab
+abccc
\ No newline at end of file

However, it only shows me that were changes in these lines. What I'd like to see is something like:

@@ -1,3 +1,3 @@
 abcde
-ab<ins>c</ins>
-abccc<ins>d</ins>
\ No newline at end of file
+ab
+abccc
\ No newline at end of file

You get my drift.

Now, I know I can use other engines to mark/check the difference on a specific line. But I'd rather use one tool that does all of it.

Upvotes: 170

Views: 97531

Answers (17)

dagelf
dagelf

Reputation: 1729

Just use meld. It highlights character level differences and has line wrapping.

I've tried everything and nothing comes close. Pretty sure it uses an algorithm that is unique.

Upvotes: 0

Eduard Florinescu
Eduard Florinescu

Reputation: 17511

As one comment to main answer said you don't have to commit to use git diff by using the --no-index argument:

git diff --no-index --word-diff=color --word-diff-regex=. file1 file2

enter image description here

green would be the character that is added by the second file.

red would be the character that is added by the first file.

Upvotes: 35

Miere
Miere

Reputation: 1595

I think the simpler solution is always a good solution. In my case, the below code helps me a lot. I hope it helps anybody else.

#!/bin/env python

def readfile( fileName ):
    f = open( fileName )
    c = f.read()
    f.close()
    return c

def diff( s1, s2 ):
    counter=0
    for ch1, ch2 in zip( s1, s2 ):
        if not ch1 == ch2:
            break
        counter+=1
    return counter < len( s1 ) and counter or -1

import sys

f1 = readfile( sys.argv[1] )
f2 = readfile( sys.argv[2] )
pos = diff( f1, f2 )
end = pos+200

if pos >= 0:
    print "Different at:", pos
    print ">", f1[pos:end]
    print "<", f2[pos:end]

You can compare two files with the following syntax at your favorite terminal:

$ ./diff.py fileNumber1 fileNumber2

Upvotes: 0

sudo rm -rf slash
sudo rm -rf slash

Reputation: 1254

Not a complete answer, but if cmp -l's output is not clear enough, you can use:

sed 's/\(.\)/\1\n/g' file1 > file1.vertical
sed 's/\(.\)/\1\n/g' file2 > file2.vertical
diff file1.vertical file2.vertical

Upvotes: 2

Tom Hale
Tom Hale

Reputation: 46715

Coloured, character-level diff ouput

Here's what you can do with the the below script and diff-highlight (which is part of git):

Coloured diff screenshot

#!/bin/sh -eu

# Use diff-highlight to show word-level differences

diff -U3 --minimal "$@" |
  sed 's/^-/\x1b[1;31m-/;s/^+/\x1b[1;32m+/;s/^@/\x1b[1;34m@/;s/$/\x1b[0m/' |
  diff-highlight

(Credit to @retracile's answer for the sed highlighting)

Upvotes: 8

Mr. Deathless
Mr. Deathless

Reputation: 1391

Python has convenient library named difflib which might help answer your question.

Below are two oneliners using difflib for different python versions.

python3 -c 'import difflib, sys; \
  print("".join( \
    difflib.ndiff( \ 
      open(sys.argv[1]).readlines(),open(sys.argv[2]).readlines())))'
python2 -c 'import difflib, sys; \
  print "".join( \
    difflib.ndiff( \
      open(sys.argv[1]).readlines(), open(sys.argv[2]).readlines()))'

These might come in handy as a shell alias which is easier to move around with your .${SHELL_NAME}rc.

$ alias char_diff="python2 -c 'import difflib, sys; print \"\".join(difflib.ndiff(open(sys.argv[1]).readlines(), open(sys.argv[2]).readlines()))'"
$ char_diff old_file new_file

And more readable version to put in a standalone file.

#!/usr/bin/env python2
from __future__ import with_statement

import difflib
import sys

with open(sys.argv[1]) as old_f, open(sys.argv[2]) as new_f:
    old_lines, new_lines = old_f.readlines(), new_f.readlines()
diff = difflib.ndiff(old_lines, new_lines)
print ''.join(diff)

Upvotes: 12

Ned
Ned

Reputation: 2262

Python's difflib is ace if you want to do this programmatically. For interactive use, I use vim's diff mode (easy enough to use: just invoke vim with vimdiff a b). I also occasionally use Beyond Compare, which does pretty much everything you could hope for from a diff tool.

I haven't see any command line tool which does this usefully, but as Will notes, the difflib example code might help.

Upvotes: 29

senf78
senf78

Reputation: 1505

Git has a word diff, and defining all characters as words effectively gives you a character diff. However, newline changes are ignored.

Example

Create a repository like this:

mkdir chardifftest
cd chardifftest
git init
echo -e 'foobarbaz\ncatdog\nfox' > file
git add -A; git commit -m 1
echo -e 'fuobArbas\ncat\ndogfox' > file
git add -A; git commit -m 2

Now, do git diff --word-diff=color --word-diff-regex=. master^ master and you'll get:

git diff

Note how both additions and deletions are recognized at the character level, while both additions and deletions of newlines are ignored.

You may also want to try one of these:

git diff --word-diff=plain --word-diff-regex=. master^ master
git diff --word-diff=porcelain --word-diff-regex=. master^ master

Upvotes: 144

Roman Riabenko
Roman Riabenko

Reputation: 251

ccdiff is a convenient dedicated tool for the task. Here is what your example looks like with it:

ccdiff example output

By default, it highlights the differences in color, but it can be used in a console without color support too.

The package is included in the main repository of Debian:

ccdiff is a colored diff that also colors inside changed lines.

All command-line tools that show the difference between two files fall short in showing minor changes visuably useful. ccdiff tries to give the look and feel of diff --color or colordiff, but extending the display of colored output from colored deleted and added lines to colors for deleted and addedd characters within the changed lines.

Upvotes: 7

zhanxw
zhanxw

Reputation: 3279

You can use:

diff -u f1 f2 |colordiff |diff-highlight

screenshot

colordiff is a Ubuntu package. You can install it using sudo apt-get install colordiff.

diff-highlight is from git (since version 2.9). It is located in /usr/share/doc/git/contrib/diff-highlight/diff-highlight. You can put it somewhere in your $PATH.

Upvotes: 56

Alex Harvey
Alex Harvey

Reputation: 15472

Most of these answers mention using of diff-highlight, a Perl module. But I didn't want to figure out how to install a Perl module. So I made a few minor changes to it to be a self-contained Perl script.

You can install it using:

▶ curl -o /usr/local/bin/DiffHighlight.pl \
   https://raw.githubusercontent.com/alexharv074/scripts/master/DiffHighlight.pl

And the usage (if you have the Ubuntu colordiff mentioned in zhanxw's answer):

▶ diff -u f1 f2 | colordiff | DiffHighlight.pl

And the usage (if you don't):

▶ diff -u f1 f2 | DiffHighlight.pl

Upvotes: 0

Joshua
Joshua

Reputation: 1235

I also wrote my own script to solve this problem using the Longest common subsequence algorithm.

It is executed as such

JLDiff.py a.txt b.txt out.html

The result is in html with red and green coloring. Larger files do exponentually take a longer amount of time to process but this does a true character by character comparison without checking line by line first.

Upvotes: 5

Chris Prince
Chris Prince

Reputation: 7564

cmp -l file1 file2 | wc

Worked well for me. The leftmost number of the result indicates the number of characters that differ.

Upvotes: 7

Venkataramesh Kommoju
Venkataramesh Kommoju

Reputation: 1101

You can use the cmp command in Solaris:

cmp

Compare two files, and if they differ, tells the first byte and line number where they differ.

Upvotes: 20

gm2008
gm2008

Reputation: 4325

Here is an online text comparison tool: http://text-compare.com/

It can highlight every single char that is different and continues compare the rest.

Upvotes: 3

naught101
naught101

Reputation: 19533

If you keep your files in Git, you can diff between versions with the diff-highlight script, which will show different lines, with differences highlighted.

Unfortunately it only works when the number of lines removed matches the number of lines added - there is stub code for when lines don't match, so presumably this could be fixed in the future.

Upvotes: 0

Will
Will

Reputation: 75615

Python's difflib can do this.

The documentation includes an example command-line program for you.

The exact format is not as you specified, but it would be straightforward to either parse the ndiff-style output or to modify the example program to generate your notation.

Upvotes: 4

Related Questions