m.edmondson
m.edmondson

Reputation: 30922

Telling the difference between two large pieces of text

What would be the best way to compare big paragraphs of text in order to tell the differences apart. For example string A and string B are the same except for a few missing words, how would I highlight these?

Originally I thought of breaking it down into word arrays, and comparing the elements. However this breaks down when a word is deleted or inserted.

Upvotes: 0

Views: 2441

Answers (6)

Beth
Beth

Reputation: 9617

If it's a one-shot deal, save them both in MS Word and use the document compare function.

Upvotes: 0

Mark Redman
Mark Redman

Reputation: 24535

Here is an implementaion of a Merge Engine that compares 2 html files and shows the highlighted differences: http://www.codeproject.com/KB/string/htmltextcompare.aspx

Upvotes: 0

BrokenGlass
BrokenGlass

Reputation: 160952

Usually text difference is measured in terms of edit distance, which is essentially the number of character additions, deletions or changes necessary to transform one text into the other.

A common implementation of this algorithm uses dynamic programming.

Upvotes: 0

Stephen
Stephen

Reputation: 3084

I saw this a few months back when I was working on a small project, but it might set you on the right track.

http://www.codeproject.com/KB/recipes/DiffAlgorithmCS.aspx

Upvotes: 1

Darth Android
Darth Android

Reputation: 3502

You want to look into Longest Common Subsequence algorithms. Most languages have a library which will do the dirty work for you, and here is one for C#. Searching for "C# diff" or "VB.Net diff" will help you find additional libraries that suit your needs.

Upvotes: 0

Mitch Wheat
Mitch Wheat

Reputation: 300719

Use a diff algorithm.

Upvotes: 3

Related Questions