scottm
scottm

Reputation: 28699

How do text differencing applications work?

How do applications like DiffMerge detect differences in text files, and how do they determine when a line is new, and not just on a different line than the file being checked against?

Is this something that is fairly easy to implement? Are there already libraries to do this?

Upvotes: 7

Views: 606

Answers (4)

beef2k
beef2k

Reputation: 2257

It actually is pretty simple; DIFF programes - most of the time - are based on the Longest Common Sequence, which can be solved using a graph algorithm.

This web page gives example implementations in C#.

Upvotes: 4

Nosredna
Nosredna

Reputation: 86276

There are libraries. Here's one: http://code.google.com/p/google-diff-match-patch/

StackOverflow uses Beyond Compare for its diff. I believe it works by calling Beyond Compare from the command line.

Upvotes: 4

Peter Ruderman
Peter Ruderman

Reputation: 12485

That's a complex question. Performing a diff means finding the minimum edit distance between the two files. That is, the minimum number of changes you must make to transform one file into the other. This is equivalent to finding the longest common subsequence of lines between the two files, and this is the basis for the various diff programs. The longest common subsequence problem is well known, and you should be able to find the dynamic programming solution on google.

The trouble with the dynamic programming approach is that it's O(n^2). It's thus very slow on large files and unusable for large, binary strings. The hard part in writing a diff program is optimizing the algorithm for your problem domain, so that you get reasonable performance (and reasonable results). The paper "An Algorithm for Differential File Comparison" by Hunt and McIlroy gives a good description of an early version of the Unix diff utility.

Upvotes: 4

Michael Kristofik
Michael Kristofik

Reputation: 35188

Here's the paper that served as the basis for the UNIX command-line tool diff.

Upvotes: 5

Related Questions