Reputation: 152
I made a complex array of keywords when my goal is to present the top 10 words are in the string.
The Full Code:
$str= $db_tag;
$tok = strtok($str, ", ");
$subStrStart = 0;
while ($tok !== false) {
preg_match_all("/\b" . preg_quote($tok, "/") . "\b/", substr($str, $subStrStart), $m);
if(count($m[0]) >= 10)
echo "'" . $tok . "' found more than 10 times, exaclty: " . count($m[0]) . "<br>";
$subStrStart += strlen($tok);
$tok = strtok(", ");
}
My string:
$db_tag="The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";
Thanks in advance.
Upvotes: 0
Views: 160
Reputation: 1387
Use the bellow function to exctract search keywords from a string
function getKeywords($string)
{
$string = "North Korea has recently introduced a sweeping new law which seeks to stamp out any kind of foreign influence - harshly punishing anyone caught with foreign films, clothing or even using slang. But why?Yoon Mi-so says she was 11 when she first saw a man executed for being caught with a South Korean drama. His entire neighbourhood was ordered to watch. If you didn't, it would be classed as treason, she told the BBC from her home in Seoul. The North Korean guards were making sure everyone knew the penalty for smuggling illicit videos was death. I have a strong memory of the man who was blindfolded, I can still see his tears flow down. That was traumatic for me. The blindfold was completely drenched in his tears. ";
$vowels = ["a","e","i","o","u"];
$ignore = ["th","thy","sh"];
$string = str_replace($vowels, "", $string);
//Create array of words split by ","
$words = explode(" ",$string);
//Create an empty array to hold data
$wordData = [];
foreach($words as $word){
//Convert to lower case (for uniformity)
$word = trim(strtolower($word));
if(strlen($word)<3)
continue;
if(array_search($word, $ignore)>-1) continue;
//Add to an array if doesn't exist; if it does,
//add to the number
if(isset($wordData[$word])){
$wordData[$word]++;
} else $wordData[$word] = 1;
}
//Order $wordData array by number
arsort($wordData);
$x = (array_keys($wordData));
$result = "";
$count = 0;
foreach ($wordData as $key => $value) {
$count++;
$result .=$key . ",";
if($count==10) break;
}
return $result;
}
Upvotes: 0
Reputation: 24699
Try this:
$db_tag = "The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";
$stopWords = array(
"the", "to", "in", "a", "of", "is", "that", "will", "and", "be"
);
// Convert to array and filter out stopwords.
$words = array_filter(function ($value) {
return !in_array($value, $stopwords);
}, explode(',', $db_tag));
$counts = array_count_values($words);
asort($counts);
$topTen = array_reverse(array_slice($counts, -10, null, true));
var_dump($topTen);
You should see:
php > var_dump($topTen);
array(10) {
["England"]=>
int(5)
["Bank"]=>
int(5)
["Brexit"]=>
int(5)
["Economy"]=>
int(4)
["Vote"]=>
int(4)
["The"]=>
int(2)
["Post"]=>
int(1)
["Given"]=>
int(1)
["A"]=>
int(1)
["Could"]=>
int(1)
}
First, we split the string into an array with explode()
. Then, we return an array of unique array values with array_count_values()
, associated with the count of their occurrence in the string.
Next, we sort the array in-place by value using asort()
. Then, we slice off the last 10 elements from the array (the highest ones) with array_slice()
and then reverse it with array_reverse()
to put them in descending order (optional).
Upvotes: 2
Reputation: 9001
If by "Top 10" you mean "10 Most-Used Words" in a string, separated by commas ,
, you can do:
$string = "The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";
//Create array of words split by ","
$words = explode(",",$string);
//Create an empty array to hold data
$wordData = [];
foreach($words as $word){
//Convert to lower case (for uniformity)
$word = strtolower($word);
//Add to an array if doesn't exist; if it does,
//add to the number
if(isset($wordData[$word])){
$wordData[$word]++;
} else $wordData[$word] = 1;
}
//Order $wordData array by number
arsort($wordData);
print_r($wordData);
This will output:
Array ( [England] => 5 [Bank] => 5 [Brexit] => 5 [Vote] => 4 [Economy] => 4 [The] => 2 [Expectations] => 1 [Will] => 1 [Of] => 1 [That] => 1 [Mount] => 1 [This] => 1 [As] => 1 [Week] => 1 [Boost] => 1 [Post] => 1 [A] => 1 [Given] => 1 [Be] => 1 [Could] => 1 [Cut] => 1 )
To filter out specific words:
//Establish array of words to filter
$filterWords = ["the", "is", "are", "of", "that"];
//Remove those words from the array created earlier
foreach($filterWords as $fw){
if(isset($wordData[$fw])) unset($wordData[$fw]);
}
print_r($wordData);
This will output:
Array ( [england] => 5 [bank] => 5 [brexit] => 5 [vote] => 4 [economy] => 4 [expectations] => 1 [will] => 1 [mount] => 1 [this] => 1 [as] => 1 [week] => 1 [boost] => 1 [post] => 1 [a] => 1 [given] => 1 [be] => 1 [could] => 1 [cut] => 1 )
Upvotes: 1
Reputation: 29441
You can use explode and an array:
$db_tag="The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";
$array = array();
foreach (explode(',', $db_tag) as $val)
{
if(!isset($array[$val]))
{
$array[$val] = 1;
}
else
{
$array[$val]++;
}
}
arsort($array);
print_r($array);
will output:
Array
(
[England] => 5
[Bank] => 5
[Brexit] => 5
[Vote] => 4
[Economy] => 4
[The] => 2
[Expectations] => 1
[Will] => 1
[Of] => 1
[That] => 1
[Mount] => 1
[This] => 1
[As] => 1
[Week] => 1
[Boost] => 1
[Post] => 1
[A] => 1
[Given] => 1
[Be] => 1
[Could] => 1
[Cut] => 1
)
Upvotes: 1