Reputation: 4905
Ok say I have a paragraph of text:
After getting cut for the first and last time in his life, Durant watched from the sofa as the American team waltzed into the gold-medal game and then was tested by Spain, ultimately emerging with a 118-107 victory that ended an eight-year gold-medal drought for the senior U.S. men's national team. But the gold-medal drought for the Americans in the FIBA World Championship remains intact, now at 16 years and counting as Team USA prepares to head to Turkey without any of the members of the so-called Redeem Team from Beijing.
What I would like to do is to run a php preg_match_all a few keywords (say example 'team' and 'for') on the text, and then retrieve a snippet (maybe 10 words before and 10 words after) for each of the result found.
Anyone has any idea how that can be done?
Upvotes: 0
Views: 516
Reputation: 244
Somthing like this will do the trick having in mind that the words you search should be at about 4 words atleast distance or it will not match it.. you can change this and adjust. This way you can adjust the importance of the relation between the keywords
preg_match_all("~([\w]+[\s\- ,]+){0,3}watched([\s\- ,]+[\w]+){0,4}\ssofa([\s\- ,]+[\w]+){0,3}~i", $text, $matches);
Upvotes: 0
Reputation: 655189
You could do this:
preg_match_all
with PREG_OFFSET_CAPTURE flag.Here’s an example:
preg_match_all('/[\w-]+/u', $str, $matches, PREG_OFFSET_CAPTURE);
$term = 'team';
$span = 3;
for ($i=0, $n=count($matches[0]); $i<$n; ++$i) {
$match = $matches[0][$i];
if (strcasecmp($term, $match[0]) === 0) {
$start = $matches[0][max(0, $i-$span)][1];
$end = $matches[0][min($n-1, $i+$span+1)][1];
echo ' … '.substr($str, $start, $end-$start).' … ';
}
}
Upvotes: 2
Reputation: 27553
You might find a lot of interesting ideas in the Drupal search exerpt builder.
http://api.drupal.org/api/function/search_excerpt/6
This one is UTF8-safe and has all kinds of edge-cases covered.
Upvotes: 0
Reputation: 34978
Check this http://www.php.net/manual/en/regexp.reference.squarebrackets.php
So this is one word with a separator:
([:word:].*[:punct:])
These are ten words with sep.
([:word:].*[:punct:]){10}
Something like this would be close to your solution:
([:word:].*[:punct:].){10}team([:punct:].[:word:].*){10}
Upvotes: 0