Reputation: 1491
I have a piece of PHP code as follows:
$words = array(
'Art' => '1',
'Sport' => '2',
'Big Animals' => '3',
'World Cup' => '4',
'David Fincher' => '5',
'Torrentino' => '6',
'Shakes' => '7',
'William Shakespeare' => '8'
);
$text = "I like artists, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";
$all_keywords = $all_keys = array();
foreach ($words as $word => $key) {
if (strpos(strtolower($text), strtolower($word)) !== false) {
$all_keywords[] = $word;
$all_keys[] = $key;
}
}
echo $keywords_list = implode(',', $all_keywords) ."<br>";
echo $keys_list = implode(',', $all_keys) . "<br>";
The code echos Art,Sport,World Cup,Shakes,William Shakespeare
and 1,2,4,7,8
; however, the code is very simple and is not accurate enough to echo the right keywords. For example, the code returns 'Shakes' => '7'
because of the Shakespeare
word in $text
, but as you can see, "Shakes" can not represent "Shakespeare" as a proper keyword. Basically I want to return Art,Sport,World Cup,William Shakespeare
and 1,2,4,8
instead of Art,Sport,World Cup,Shakes,William Shakespeare
and 1,2,4,7,8
. So, could you please help me how to develop a better code to extract the keywords without having similar problems? thanks for your help.
Upvotes: 4
Views: 1791
Reputation: 1766
From the top of my head, I think there are two additional steps to make this function a bit robust.
P.S. SO ios app rocks! But still not easy to code(bloody autocorrect!)
Upvotes: 0
Reputation: 14730
Replace
strpos(strtolower($text), strtolower($word)
With
preg_match('/\b'.$word.'\b/',$text)
Or, since you don't seem to care about capital letters:
preg_match('/\b'.strtolower($word).'\b/', strtolower($text))
I suggest in that case that you perform strtolower($text)
beforehand, for instance just before the beginning of foreach
.
Upvotes: 0
Reputation: 646
You're better off using regular expressions if you want accurate matches.
I modified your original code to use them instead of strpos()
as it will result in partial matches, as was the case with your code.
There's room for improvement, but hopefully you get the basic gist of it.
Let me know if you have any questions.
Code was modified to a shell script, so save to demo.php and chmod +x demo.php && ./demo.php
` #!/usr/bin/php
//array of regular expressions to match your words/phrases
$words = array(
'/\b[Aa]rt\b/',
'/\bI\b/',
'/\bSport\b/',
'/\bBig Animals\b/' ,
'/\bWorld Cup\b/' ,
'/\bDavid Fincher\b/',
'/\bTorrentino\b/' ,
'/\bShakes\b/' ,
'/\b[sS]port[s]{0,1}\b/' ,
'/\bWilliam Shakespeare\b/',
);
$text = "I like artists and art, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";
$all_keywords = array(); //changed formatting for clarity
$all_keys = array();
foreach ($words as $regex) {
$m = array();
if (preg_match_all($regex, $text, $m, PREG_OFFSET_CAPTURE)>=1)
for ($n=0;$n<count($m); ++$n) {
$match = $m[0];
foreach($match as $mm) {
$key = $mm[1]; //key is the offset in $text where the match begins
$word = $mm[0]; //the matched word/phrase
$all_keywords[] = $word;
$all_keys[] = $key;
}
}
}
echo "\$text = \"$text\"\n";
echo $keywords_list = implode(',', $all_keywords) ."<br>\n";
echo $keys_list = implode(',', $all_keys) . "<br>\n";
`
Upvotes: 3
Reputation: 173562
You may want to look at regular expressions to weed out partial matches:
// create regular expression by using alternation
// of all given words
$re = '/\b(?:' . join('|', array_map(function($keyword) {
return preg_quote($keyword, '/');
}, array_keys($words))) . ')\b/i';
preg_match_all($re, $text, $matches);
foreach ($matches[0] as $keyword) {
echo $keyword, " ", $words[$keyword], "\n";
}
The expression uses the \b
assertion to match word boundaries, i.e. the word must be on its own.
Output
World Cup 4
William Shakespeare 8
Upvotes: 4