JimmyBanks
JimmyBanks

Reputation: 4708

Match regex pattern that isn't within a bbcode tag

I am attempting to create a regex patten that will match words in a string that begin with @

Regex that solves this initial problem is '~(@\w+)~'

A second requirement of the code is that it must also ignore any matches that occur within [quote] and [/quote] tags

A couple of attempts that have failed are:

(?:[0-9]+|~(@\w+)~)(?![0-9a-z]*\[\/[a-z]+\])

/[quote[\s\]][\s\S]*?\/quote](*SKIP)(*F)|~(@\w+)~/i

Example: the following string should have an array output as displayed:

$results = [];
$string = "@friends @john [quote]@and @jane[/quote] @doe";

//run regex match
preg_match_all('regex', $string, $results);

//dump results
var_dump($results[1]);

//results: array consisting of:
    [1]=>"@friends"
    [2]=>"@john"
    [3]=>"@doe

Upvotes: 1

Views: 80

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

You may use the following regex (based on another related question):

'~(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|@\w+~s'

See the regex demo. The regex accounts for nested [quote] tags.

Details

  • (\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F) - matches the pattern inside capturing parentheses and then (*SKIP)(*F) make the regex engine omit the matched text:
    • \[quote] - a literal [quote] string
    • (?:(?1)|.)*? - any 0+ (but as few as possible) occurrences of the whole Group 1 pattern ((?1)) or any char (.)
    • \[/quote] - a literal [/quote] string
  • | - or
  • @\w+ - a @ followed with 1+ word chars.

PHP demo:

$results = [];
$string = "@friends @john [quote]@and @jane[/quote] @doe";
$rx = '~(\[quote\](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|@\w+~s';
preg_match_all($rx, $string, $results);
print_r($results[0]);
// => Array ( [0] => @friends [1] => @john [2] => @doe )

Upvotes: 2

Related Questions