Reputation: 1617
Is it any function in PHP that check the % of similarity of two strings?
For example i have:
$string1="Hello how are you doing"
$string2= " hi, how are you"
and the function($string1, $string2)
will return me true because the words "how", "are", "you" are present in the line.
Or even better, return me 60% of similarity because "how", "are", "you" is a 3/5 of $string1
.
Does any function exist in PHP which do that?
Upvotes: 23
Views: 22674
Reputation: 156
Although this question is quite old but just adding my solution due to few reasons. First is that the author desired of comparing similar words rather than string as per his comment. Secondly, most of the answer tried to solve it via similar_text
which is not suitable for this problem because it compare the text by characters difference and find the similarity and that results in match of quite different strings too. First answer given by @Hugo Delsing is using array_flip
which reverse the keys and values but it will consider only word if key is repeated more than one time.
I have posted following answer which will compare the words. The only issue it can give is that it won't consider the order of the words very much.
function compareStrings($s1, $s2)
{
if (strlen($s1) == 0 || strlen($s2) == 0) {
return 0;
}
$ar1 = preg_split('/[^\w\-]+/', strtolower($s1), null, PREG_SPLIT_NO_EMPTY);
$ar2 = preg_split('/[^\w\-]+/', strtolower($s2), null, PREG_SPLIT_NO_EMPTY);
$l1 = count($ar1);
$l2 = count($ar2);
$ar2_copy = array_values($ar2);
$matched_indices = [];
$word_map = [];
foreach ($ar1 as $k => $w1) {
if (isset($word_map[$w1])) {
if ($word_map[$w1][0] >= $k) {
$matched_indices[$k] = $word_map[$w1][0];
}
array_splice($word_map[$w1], 0, 1);
} else {
$indices = array_keys($ar2_copy, $w1);
$index_count = count($indices);
if ($index_count) {
if ($index_count == 1) {
$matched_indices[$k] = $indices[0];
// remove the word at given index from second array so that it won't repeat again
unset($ar2_copy[$indices[0]]);
} else {
$matched_indices[$k] = $indices[0];
// remove the word at given indices from second array so that it won't repeat again
foreach ($indices as $index) {
unset($ar2_copy[$index]);
}
array_splice($indices, 0, 1);
$word_map[$w1] = $indices;
}
}
}
}
return round(count($matched_indices) * 100 / $l1, 2);
}
Upvotes: 0
Reputation: 17205
In addition to Alex Siri's answer and according to the following article:
http://docstore.mik.ua/orelly/webprog/php/ch04_06.htm
PHP provides several functions that let you test whether two strings are approximately equal:
$string1="Hello how are you doing" ;
$string2= " hi, how are you";
SOUNDEX
if (soundex($string1) == soundex($string2)) {
echo "similar";
} else {
echo "not similar";
}
METAPHONE
if (metaphone($string1) == metaphone($string2)) {
echo "similar";
} else {
echo "not similar";
}
SIMILAR TEXT
$similarity = similar_text($string1, $string2);
LEVENSHTEIN
$distance = levenshtein($string1, $string2);
Upvotes: 13
Reputation: 2864
As other answers have already said, you can use similar_text. Here's the demonstration:
$string1="Hello how are you doing" ;
$string2= " hi, how are you";
echo similar_text($string1, $string2, $perc); //12
echo $perc; //61.538461538462
will return 12, and will set in $perc the percentage of similarity as you asked for.
Upvotes: 11
Reputation: 1844
You can use the PHP function similar_text
.
int similar_text ( string $first , string $second)
Check the PHP doc at: http://php.net/manual/en/function.similar-text.php
Upvotes: 0
Reputation: 14163
As it's a nice question, I put some effort into it:
<?php
$string1="Hello how are you doing";
$string2= " hi, how are you";
echo 'Compare result: ' . compareStrings($string1, $string2) . '%';
//60%
function compareStrings($s1, $s2) {
//one is empty, so no result
if (strlen($s1)==0 || strlen($s2)==0) {
return 0;
}
//replace none alphanumeric charactors
//i left - in case its used to combine words
$s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
$s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);
//remove double spaces
while (strpos($s1clean, " ")!==false) {
$s1clean = str_replace(" ", " ", $s1clean);
}
while (strpos($s2clean, " ")!==false) {
$s2clean = str_replace(" ", " ", $s2clean);
}
//create arrays
$ar1 = explode(" ",$s1clean);
$ar2 = explode(" ",$s2clean);
$l1 = count($ar1);
$l2 = count($ar2);
//flip the arrays if needed so ar1 is always largest.
if ($l2>$l1) {
$t = $ar2;
$ar2 = $ar1;
$ar1 = $t;
}
//flip array 2, to make the words the keys
$ar2 = array_flip($ar2);
$maxwords = max($l1, $l2);
$matches = 0;
//find matching words
foreach($ar1 as $word) {
if (array_key_exists($word, $ar2))
$matches++;
}
return ($matches / $maxwords) * 100;
}
?>
Upvotes: 40
Reputation: 1617
Ok here is my function that makes it much interesting.
I'm checking approximately similarity of strings.
Here is a criteria I use for that.
Example:
$string1 = "How much will it cost to me" (string in vocabulary)
$string2 = "How much does costs it " //("costs" instead "cost" -is a mistake) (user input);
Algorithm: 1) Check the similarity of words and create clean strings with "right" words (in the order it appear in vocabulary). OUTPUT: "how much it cost" 2) create clean string with "right words" in order it appear in user input. OUTPUT: "how much cost it" 3)Compare two outputs - if not the same - return no, else if same return yes.
error_reporting(E_ALL);
ini_set('display_errors', true);
$string1="сколько это стоит ваще" ;
$string2= "сколько будет стоить это будет мне";
if(compareStrings($string1, $string2)) {
echo "yes";
} else {
echo 'no';
}
//echo compareStrings($string1, $string2);
function compareStrings($s1, $s2) {
if (strlen($s1)==0 || strlen($s2)==0) {
return 0;
}
while (strpos($s1, " ")!==false) {
$s1 = str_replace(" ", " ", $s1);
}
while (strpos($s2, " ")!==false) {
$s2 = str_replace(" ", " ", $s2);
}
$ar1 = explode(" ",$s1);
$ar2 = explode(" ",$s2);
// $array1 = array_flip($ar1);
// $array2 = array_flip($ar2);
$l1 = count($ar1);
$l2 = count($ar2);
$meaning="";
$rightorder="";
$compare=0;
for ($i=0;$i<$l1;$i++) {
for ($j=0;$j<$l2;$j++) {
$compare = (similar_text($ar1[$i],$ar2[$j],$percent)) ;
// echo $compare;
if ($percent>=85) {
$meaning=$meaning." ".$ar1[$i];
$rightorder=$rightorder." ".$ar1[$j];
$compare=0;
}
}
}
//print_r($rightorder);
if ($rightorder==$meaning) {
return true;
} else {
return false;
}
}
i would love to hear your opinion and suggestion how to improve it
Upvotes: 0