Reputation: 30922
What would be the best way to compare big paragraphs of text in order to tell the differences apart. For example string A and string B are the same except for a few missing words, how would I highlight these?
Originally I thought of breaking it down into word arrays, and comparing the elements. However this breaks down when a word is deleted or inserted.
Upvotes: 0
Views: 2441
Reputation: 9617
If it's a one-shot deal, save them both in MS Word and use the document compare function.
Upvotes: 0
Reputation: 24535
Here is an implementaion of a Merge Engine that compares 2 html files and shows the highlighted differences: http://www.codeproject.com/KB/string/htmltextcompare.aspx
Upvotes: 0
Reputation: 160952
Usually text difference is measured in terms of edit distance, which is essentially the number of character additions, deletions or changes necessary to transform one text into the other.
A common implementation of this algorithm uses dynamic programming.
Upvotes: 0
Reputation: 3084
I saw this a few months back when I was working on a small project, but it might set you on the right track.
http://www.codeproject.com/KB/recipes/DiffAlgorithmCS.aspx
Upvotes: 1
Reputation: 3502
You want to look into Longest Common Subsequence algorithms. Most languages have a library which will do the dirty work for you, and here is one for C#. Searching for "C# diff" or "VB.Net diff" will help you find additional libraries that suit your needs.
Upvotes: 0