Reputation: 21
See image:
I really like to know what is the best approach for comparing two strings (long text files) on duplicated words, then I need to highlight them in the second string. Just like copyscape does. Its for our internal database of content.
Am I missing a simple PHP function? Can anyone point me to the right direction?
What I know is to make two arrays and comparing them with a foreach loop. But it doesn't make sense and my script is getting 40 lines without highlighting
Upvotes: 0
Views: 2090
Reputation: 302
I think https://github.com/gorhill/PHP-FineDiff might do the job. It compares texts on various granularities even down to character level if needed.
You can actually find duplicate phrases in common if they appear in the same order by adding
static $commons;
public static function renderCommonsFromOpcodes($from, $opcodes)
{
FineDiff::renderFromOpcodes($from, $opcodes, array('FineDiff', 'renderCommonsFromOpcode'));
}
private static function renderCommonsFromOpcode($opcode, $from, $from_offset, $from_len)
{
if ($opcode === 'c') {
self::$commons[] = substr($from, $from_offset, $from_len);
}
}
to FineDiff::class in finediff.php.
Usage:
include 'finediff.php';
$from_text = "PHP FPM is a popular general-purpose scripting language that is especially suited to web development.";
$to_text = "Fast, flexible and pragmatic, PHP FPM powers everything from your blog to the most popular websites in the world";
$opcodes = FineDiff::getDiffOpcodes($from_text, $to_text, FineDiff::wordDelimiters);
FineDiff::renderCommonsFromOpcodes($from_text, $opcodes);
print_r(FineDiff::$commons);
/*
Array
(
[0] => PHP FPM
[1] => popular
)
*/
Upvotes: 0
Reputation: 33823
One method you could play around with is to use array_intersect
where the two arrays are generated from the two strings you wish to compare and then use a string replacement function to highlight the common words.
$str1='PHP is a popular general-purpose scripting language that is especially suited to web development.';
$str2='Fast, flexible and pragmatic, PHP powers everything from your blog to the most popular websites in the world.';
$a1=explode(' ',$str1);
$a2=explode(' ',$str2);
function longenough($word){
return strlen( $word ) > 3;
}
$a1=array_filter($a1,'longenough');
$a2=array_filter($a2,'longenough');
$common=array_intersect( $a1, $a2 );
foreach( $common as $word ){
$str2=preg_replace( "@($word)@i",'<span style="color:red">$1</span>', $str2 );
}
echo $str2;
Upvotes: 1