seppzzz
seppzzz

Reputation: 229

PHP preg_match (_all) for text range

trying to get Textrange (n words before and after) a search string (myself)

$text = 'Me, my dog and “myself“ are going on a vacation. Irene and myself are broke. Myself is here :P John and myself!';

 preg_match_all("/(?:[^ ]+ ){0,2}(?:[“'"(‘. ])myself(?:[“'")‘. ])(?: [^ ]+){0,2}/", $text, $matches);   

this gives me matches :

• dog and “myself“ are going

• myself

But it should be:

• dog and “myself“ are going

• Irene and myself are broke

• John and myself!

Please help me find all matches as text range 2 words before and 2 words after. no matter if there is a special char or whitespace before or after search string (myself) or 'myself' or “myself“ ...

thanks.Sepp

Upvotes: 1

Views: 211

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

The problem arises due to the fact that both [“'"(‘. ] and [“'")‘. ] are obligatory and require one char to be there before and after myself. Then, there must also be another space before and after myself required by (?:[^ ]+ ){0,2} and (?: [^ ]+){0,2}.

You may use

'/(?:\S+\s+){0,2}(?:[“'"(‘.])?myself(?:[“'")‘.]?)(?:\s+\S+){0,2}/u'

Or allow any punctuation around myself with \p{P}:

'/(?:\S+\s+){0,2}\p{P}?myself\p{P}?(?:\s+\S+){0,2}/u'

See the regex demo

Note that (?:[“'"(‘.])? and (?:[“'")‘.]?) (or \p{P}?) are all optional, the ? quantifier after them makes the regex engine match only 1 or 0 occurrences of these patterns. So, if it is there or not, the match occurs.

PHP demo:

$text = 'Me, my dog and “myself“ are going on a vacation. Irene and myself are broke. Myself is here :P John and myself!';
if (preg_match_all('/(?:\S+\s+){0,2}\p{P}?myself\p{P}?(?:\s+\S+){0,2}/u', $text, $result)) {
    print_r($result[0]);
}

Output:

Array
(
    [0] => dog and “myself“ are going
    [1] => Irene and myself are broke.
    [2] => John and myself!
)

Upvotes: 1

Related Questions