Ilya Libin
Ilya Libin

Reputation: 1617

How to check a partial similarity of two strings in PHP

Is it any function in PHP that check the % of similarity of two strings?

For example i have:

$string1="Hello how are you doing" 
$string2= " hi, how are you"

and the function($string1, $string2) will return me true because the words "how", "are", "you" are present in the line.

Or even better, return me 60% of similarity because "how", "are", "you" is a 3/5 of $string1.

Does any function exist in PHP which do that?

Upvotes: 23

Views: 22674

Answers (6)

Raheel Shahzad
Raheel Shahzad

Reputation: 156

Although this question is quite old but just adding my solution due to few reasons. First is that the author desired of comparing similar words rather than string as per his comment. Secondly, most of the answer tried to solve it via similar_text which is not suitable for this problem because it compare the text by characters difference and find the similarity and that results in match of quite different strings too. First answer given by @Hugo Delsing is using array_flip which reverse the keys and values but it will consider only word if key is repeated more than one time. I have posted following answer which will compare the words. The only issue it can give is that it won't consider the order of the words very much.

function compareStrings($s1, $s2)
{
    if (strlen($s1) == 0 || strlen($s2) == 0) {
        return 0;
    }

    $ar1 = preg_split('/[^\w\-]+/', strtolower($s1), null, PREG_SPLIT_NO_EMPTY);
    $ar2 = preg_split('/[^\w\-]+/', strtolower($s2), null, PREG_SPLIT_NO_EMPTY);

    $l1 = count($ar1);
    $l2 = count($ar2);

    $ar2_copy = array_values($ar2);

    $matched_indices = [];
    $word_map = [];
    foreach ($ar1 as $k => $w1) {
        if (isset($word_map[$w1])) {
            if ($word_map[$w1][0] >= $k) {
                $matched_indices[$k] = $word_map[$w1][0];
            }
            array_splice($word_map[$w1], 0, 1);
        } else {
            $indices = array_keys($ar2_copy, $w1);
            $index_count = count($indices);
            if ($index_count) {
                if ($index_count == 1) {
                    $matched_indices[$k] = $indices[0];
                    // remove the word at given index from second array so that it won't repeat again
                    unset($ar2_copy[$indices[0]]);
                } else {
                    $matched_indices[$k] = $indices[0];
                    // remove the word at given indices from second array so that it won't repeat again
                    foreach ($indices as $index) {
                        unset($ar2_copy[$index]);
                    }
                    array_splice($indices, 0, 1);
                    $word_map[$w1] = $indices;
                }
            }
        }
    }
    return round(count($matched_indices) * 100 / $l1, 2);
}

Upvotes: 0

RafaSashi
RafaSashi

Reputation: 17205

In addition to Alex Siri's answer and according to the following article:

http://docstore.mik.ua/orelly/webprog/php/ch04_06.htm

PHP provides several functions that let you test whether two strings are approximately equal:

$string1="Hello how are you doing" ;
$string2= " hi, how are you";

SOUNDEX

if (soundex($string1) == soundex($string2)) {

  echo "similar";

} else {

  echo "not similar";

}

METAPHONE

if (metaphone($string1) == metaphone($string2)) {

   echo "similar";

} else {

  echo "not similar";

}

SIMILAR TEXT

$similarity = similar_text($string1, $string2);

LEVENSHTEIN

$distance = levenshtein($string1, $string2); 

Upvotes: 13

Alex Siri
Alex Siri

Reputation: 2864

As other answers have already said, you can use similar_text. Here's the demonstration:

$string1="Hello how are you doing" ;
$string2= " hi, how are you";

echo similar_text($string1, $string2, $perc); //12

echo $perc; //61.538461538462

will return 12, and will set in $perc the percentage of similarity as you asked for.

Upvotes: 11

Salvi Pascual
Salvi Pascual

Reputation: 1844

You can use the PHP function similar_text.

int similar_text ( string $first , string $second)

Check the PHP doc at: http://php.net/manual/en/function.similar-text.php

Upvotes: 0

Hugo Delsing
Hugo Delsing

Reputation: 14163

As it's a nice question, I put some effort into it:

<?php
$string1="Hello how are you doing";
$string2= " hi, how are you";

echo 'Compare result: ' . compareStrings($string1, $string2) . '%';
//60%


function compareStrings($s1, $s2) {
    //one is empty, so no result
    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }

    //replace none alphanumeric charactors
    //i left - in case its used to combine words
    $s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
    $s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);

    //remove double spaces
    while (strpos($s1clean, "  ")!==false) {
        $s1clean = str_replace("  ", " ", $s1clean);
    }
    while (strpos($s2clean, "  ")!==false) {
        $s2clean = str_replace("  ", " ", $s2clean);
    }

    //create arrays
    $ar1 = explode(" ",$s1clean);
    $ar2 = explode(" ",$s2clean);
    $l1 = count($ar1);
    $l2 = count($ar2);

    //flip the arrays if needed so ar1 is always largest.
    if ($l2>$l1) {
        $t = $ar2;
        $ar2 = $ar1;
        $ar1 = $t;
    }

    //flip array 2, to make the words the keys
    $ar2 = array_flip($ar2);


    $maxwords = max($l1, $l2);
    $matches = 0;

    //find matching words
    foreach($ar1 as $word) {
        if (array_key_exists($word, $ar2))
            $matches++;
    }

    return ($matches / $maxwords) * 100;    
}
?>

Upvotes: 40

Ilya Libin
Ilya Libin

Reputation: 1617

Ok here is my function that makes it much interesting.

I'm checking approximately similarity of strings.

Here is a criteria I use for that.

  1. The order of the words is important
  2. The words can have 85% of similarity.

Example:

$string1 = "How much will it cost to me" (string in vocabulary)
$string2 = "How much does costs it "   //("costs" instead "cost" -is a mistake) (user input);

Algorithm: 1) Check the similarity of words and create clean strings with "right" words (in the order it appear in vocabulary). OUTPUT: "how much it cost" 2) create clean string with "right words" in order it appear in user input. OUTPUT: "how much cost it" 3)Compare two outputs - if not the same - return no, else if same return yes.

error_reporting(E_ALL);
ini_set('display_errors', true);

$string1="сколько это стоит ваще" ;
$string2= "сколько будет стоить это будет мне";

if(compareStrings($string1, $string2)) {
 echo "yes";    
} else {
    echo 'no';
}
//echo compareStrings($string1, $string2);

function compareStrings($s1, $s2) {

    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }

    while (strpos($s1, "  ")!==false) {
        $s1 = str_replace("  ", " ", $s1);
    }
    while (strpos($s2, "  ")!==false) {
        $s2 = str_replace("  ", " ", $s2);
    }

    $ar1 = explode(" ",$s1);
    $ar2 = explode(" ",$s2);
  //  $array1 = array_flip($ar1);
  //  $array2 = array_flip($ar2);
    $l1 = count($ar1);
    $l2 = count($ar2);

 $meaning="";
    $rightorder="";
    $compare=0;
    for ($i=0;$i<$l1;$i++) {


        for ($j=0;$j<$l2;$j++) {

            $compare = (similar_text($ar1[$i],$ar2[$j],$percent)) ;
          //  echo $compare;
if ($percent>=85) {
    $meaning=$meaning." ".$ar1[$i];
    $rightorder=$rightorder." ".$ar1[$j];
    $compare=0;
}

        }


    }
    //print_r($rightorder);
if ($rightorder==$meaning) {
    return true;
} else {
    return false;
}

}

i would love to hear your opinion and suggestion how to improve it

Upvotes: 0

Related Questions