Reputation: 7128
I'm trying to diff two strings by phrase, similar to the way that StackOverflow diffs the two strings on the version edits page. What would be an algorithm to do this? Are there gems, or other standard libraries that accomplish this?
EDIT: I've seen other diffing algorithms (Differ with Ruby) and they seem to result in the following:
>> o = 'now is the time when all good men.'
>> p = 'now some time the men time when all good men.'
>> Differ.diff_by_word(o,p).format_as(:html)
=> "now <del class=\"differ\">some</del><ins class=\"differ\">is</ins>
<del class=\"differ\">time </del>the <del class=\"differ\">men </del>time
when all good men."
Note how the words are diffed on a per word basis? I'd like some way of diffing more by phrase, so the above code output:
=> "now <del class=\"differ\">some time the men</del><ins class=\"differ\">is
the</ins> time when all good men."
Am I hoping for too much?
Upvotes: 8
Views: 683
Reputation: 131142
The algorithm you are looking for is Longest Common Subsequence it does most of the work for you.
The outline is something along these lines.
So for example say you have:
"hello world this is a test"
compared with:
"mister hello world"
The result from the LCS is
Now you sprinkle the special sauce when building up. You join the string together while staying mindful of the previous action. The naive algorithm is just join sections that are the same action.
Finally you transform it to html:
<ins>mister</ins> hello world <del>this is a test</del>
Of course the devil is in the detail:
Upvotes: 6