Em Fhal
Em Fhal

Reputation: 152

Top 10 keywords PHP in a string

I made a complex array of keywords when my goal is to present the top 10 words are in the string.

b) I just want to introduce a words of importance rather than words like "The,That,to,a...".

The Full Code:

$str= $db_tag;
    $tok = strtok($str, ", ");
    $subStrStart = 0;

    while ($tok !== false) {
        preg_match_all("/\b" . preg_quote($tok, "/") . "\b/", substr($str, $subStrStart), $m);
        if(count($m[0]) >= 10)
            echo "'" . $tok . "' found more than 10 times, exaclty: " . count($m[0]) . "<br>";
        $subStrStart += strlen($tok);
        $tok = strtok(", ");
    }    

My string:

$db_tag="The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";

Thanks in advance.

Upvotes: 0

Views: 160

Answers (4)

Malek Tubaisaht
Malek Tubaisaht

Reputation: 1387

Use the bellow function to exctract search keywords from a string

function getKeywords($string)
{
    $string = "North Korea has recently introduced a sweeping new law which seeks to stamp out any kind of foreign influence - harshly punishing anyone caught with foreign films, clothing or even using slang. But why?Yoon Mi-so says she was 11 when she first saw a man executed for being caught with a South Korean drama.    His entire neighbourhood was ordered to watch. If you didn't, it would be classed as treason, she told the BBC from her home in Seoul.        The North Korean guards were making sure everyone knew the penalty for smuggling illicit videos was death. I have a strong memory of the man who was blindfolded, I can still see his tears flow down. That was traumatic for me. The blindfold was completely drenched in his tears. ";
    $vowels = ["a","e","i","o","u"];
    $ignore = ["th","thy","sh"];
    $string = str_replace($vowels, "", $string);

//Create array of words split by ","
$words = explode(" ",$string);

//Create an empty array to hold data
$wordData = [];

foreach($words as $word){
    //Convert to lower case (for uniformity)
    $word = trim(strtolower($word));
    if(strlen($word)<3)
        continue;
    if(array_search($word, $ignore)>-1) continue;
    //Add to an array if doesn't exist; if it does,
    //add to the number
    if(isset($wordData[$word])){
        $wordData[$word]++;
    } else $wordData[$word] = 1;
}

//Order $wordData array by number
arsort($wordData);

$x = (array_keys($wordData));
$result = "";
$count = 0;

foreach ($wordData as $key => $value) {
    $count++;
    $result .=$key . ",";
    if($count==10) break;
}

return $result;
}

Upvotes: 0

Will
Will

Reputation: 24699

Try this:

$db_tag = "The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";

$stopWords = array(
    "the", "to", "in", "a", "of", "is", "that", "will", "and", "be"
);

// Convert to array and filter out stopwords.
$words = array_filter(function ($value) {
    return !in_array($value, $stopwords);
}, explode(',', $db_tag));

$counts = array_count_values($words);
asort($counts);
$topTen = array_reverse(array_slice($counts, -10, null, true));

var_dump($topTen);

You should see:

php > var_dump($topTen);
array(10) {
  ["England"]=>
  int(5)
  ["Bank"]=>
  int(5)
  ["Brexit"]=>
  int(5)
  ["Economy"]=>
  int(4)
  ["Vote"]=>
  int(4)
  ["The"]=>
  int(2)
  ["Post"]=>
  int(1)
  ["Given"]=>
  int(1)
  ["A"]=>
  int(1)
  ["Could"]=>
  int(1)
}

First, we split the string into an array with explode(). Then, we return an array of unique array values with array_count_values(), associated with the count of their occurrence in the string.

Next, we sort the array in-place by value using asort(). Then, we slice off the last 10 elements from the array (the highest ones) with array_slice() and then reverse it with array_reverse() to put them in descending order (optional).

Upvotes: 2

Ben
Ben

Reputation: 9001

If by "Top 10" you mean "10 Most-Used Words" in a string, separated by commas ,, you can do:

$string = "The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";

//Create array of words split by ","
$words = explode(",",$string);

//Create an empty array to hold data
$wordData = [];

foreach($words as $word){
    //Convert to lower case (for uniformity)
    $word = strtolower($word);

    //Add to an array if doesn't exist; if it does,
    //add to the number
    if(isset($wordData[$word])){
        $wordData[$word]++;
    } else $wordData[$word] = 1;
}

//Order $wordData array by number
arsort($wordData);

print_r($wordData);

This will output:

Array ( [England] => 5 [Bank] => 5 [Brexit] => 5 [Vote] => 4 [Economy] => 4 [The] => 2 [Expectations] => 1 [Will] => 1 [Of] => 1 [That] => 1 [Mount] => 1 [This] => 1 [As] => 1 [Week] => 1 [Boost] => 1 [Post] => 1 [A] => 1 [Given] => 1 [Be] => 1 [Could] => 1 [Cut] => 1 )


To filter out specific words:

//Establish array of words to filter
$filterWords = ["the", "is", "are", "of", "that"];

//Remove those words from the array created earlier
foreach($filterWords as $fw){
    if(isset($wordData[$fw])) unset($wordData[$fw]);
}

print_r($wordData);

This will output:

Array ( [england] => 5 [bank] => 5 [brexit] => 5 [vote] => 4 [economy] => 4 [expectations] => 1 [will] => 1 [mount] => 1 [this] => 1 [as] => 1 [week] => 1 [boost] => 1 [post] => 1 [a] => 1 [given] => 1 [be] => 1 [could] => 1 [cut] => 1 )

Upvotes: 1

Thomas Ayoub
Thomas Ayoub

Reputation: 29441

You can use explode and an array:

$db_tag="The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";
$array = array();
foreach (explode(',', $db_tag) as $val) 
{
    if(!isset($array[$val]))
    {
        $array[$val] = 1;
    }
    else
    {
        $array[$val]++;
    }
}
arsort($array);
print_r($array);

will output:

Array
(
    [England] => 5
    [Bank] => 5
    [Brexit] => 5
    [Vote] => 4
    [Economy] => 4
    [The] => 2
    [Expectations] => 1
    [Will] => 1
    [Of] => 1
    [That] => 1
    [Mount] => 1
    [This] => 1
    [As] => 1
    [Week] => 1
    [Boost] => 1
    [Post] => 1
    [A] => 1
    [Given] => 1
    [Be] => 1
    [Could] => 1
    [Cut] => 1
)

Upvotes: 1

Related Questions