Shani1351
Shani1351

Reputation: 509

PHP get 10 words around a search phrase

I am trying to do the following :

grab 5 words before the search phrase (or Y if there is only Y words there) and 5 words after the search phrase (or Y if there is only Y words there) from a block of text (when I say words I mean words or numbers whatever is in the block of text)

eg

The block of text: "Welcome to Stack Overflow! Visit your user page to set your name and email."

if you was to search "visit your" it would return: "Welcome to Stack Overflow! Visit your user page to set your"

I've tried using this

$preg_safe = str_replace(" ", "\s", preg_quote($search)); 
$pattern = "/(\w*\S\s+){0,8}\S*\b($preg_safe)\b\S*(\s\S+){0,8}/ix";
if(preg_match_all($pattern, $full_text, $matches))
{ 
    $result = str_replace(strtolower($search), "<span class='searched-for'>$search</span>", strtolower($matches[0][0])); 
}
else
{ 
    $result = false; 
}

And it works if the search phrase is in English, but I need it to work in other languages as well. It doesn't work for an Hebrew search phrase for example.

I've tried to change the pattern to :

$pattern = "(*UTF8)/(\w*\S\s+){0,8}\S*\b($preg_safe)\b\S*(\s\S+){0,8}/i";

But it didn't work.

How can I make it work for other languages?

////////////////// EDIT //////////

As enrico.bacis suggested - I've changed the pattern to :

$pattern = "/(\w\p{Hebrew}*\S\s+){0,20}\S*\b($preg_safe)\b\S*(\s\S+){0,20}/ixu";

Now it works for English and Hebrew search phrases, but the result text is being cut when there is a special character (' for example).

How can I make the pattern return the text around the search phrase even if it contains special characters?

Upvotes: 0

Views: 495

Answers (1)

enrico.bacis
enrico.bacis

Reputation: 31524

Your problem is on the \w that is not matching Hebrew characters, in fact \w is just a shortcut for a so-called "word" character: [A-Za-z0-9_].

To make a regex able to capture also Hebrew characters you need only to make two changes:

  • Add u to the modifier to manage UTF8 characters (so your modifier will be /ixu)

  • Replace [\w\p{Hebrew}] for every occurrence of \w in your pattern.

You can also check here for more answers on this topic.

Upvotes: 1

Related Questions