Reputation: 209
I am trying to implement word level matches in Google Diff Match Patch, but it is beating me up.
The result I get is:
=I've never been =|-a-|=t=|= th=|-e-|=se places=|
=I've never been =|=t=|+o+|= th=|+o+|=se places=|
The result I want is:
=I've never been =|-at these-|= places=|
=I've never been =|+to those+|= places=|
The documentation says:
make a copy of diff_linesToChars and call it diff_linesToWords. Look for the line that identifies the next line boundary: lineEnd = text.indexOf('\n', lineStart);
In the c# version, I found the line to change in diff_linesToCharsMunge, which I changed to:
lineEnd = text.Replace(@"/[\n\.,;:]/ g"," ").IndexOf(" ", lineStart);
However, there is no change in granularity -it still finds differences at character level.
I am calling:
List<Diff> differences = diffs.diff_main(linepair.Original, linepair.Corrected, true);
diffs.diff_cleanupSemantic(differences);
I have stepped through to make sure that it is hitting the change I made (incidently, there is a hardcoded minimum of 100 characters before it kicks in).
Upvotes: 3
Views: 1567
Reputation: 31
I was stuck with the same problem in php
's version of this library and found a solution here.
You just have to make a copy of linesToChars
function called linesToWords
Here's How I did it
$dtk = new DiffToolkit();
$a = $dtk->linesToWords($old ,$new);
$lineText1 = $a[0];
$lineText2 = $a[1];
$lineArray = $a[2];
$diffs = $dmp->diff_main($lineText1, $lineText2);
$dtk->charsToLines($diffs ,$lineArray );
Upvotes: 0
Reputation: 514
I have created a sample dotnet project with diffmatch program. Its probably older version of DiffMatchPatch file but the word and lines work.
For your above sample text ,I get below output.
at these | to those
Upvotes: 4