Aviel Fedida
Aviel Fedida

Reputation: 4102

Php, Regex, Between strings

I was trying to find a pattern for the following scenario:

Lets say i have this string:

someString[code]some code[/code]someString

Now some code can be anything, What i want to get is reserved words (break, class, etc), So for a real scenario this is a string:

someString
[code]
class someClass{}
[/code]
someString

// And again

someString
[code]
class someClass{}
[/code]
someString

So what i was trying to understand is how can i match all the reserved words that between all of the [code][/code] tags.

For example: [code]someReservedWord some text anotherReservedWord[/code] I only want to match someReservedWord and anotherReservedWord.

I was thinking to use preg_match_all So i can get all reserved words inside each [code][/code] and use PREG_OFFSET_CAPTURE to get their positions,

The only thing i can't figure out is the pattern, if anyone got idea i will be very thankful, Thank you all and have a nice day.

Upvotes: 0

Views: 87

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

You can use this:

$pattern = <<<'LOD'
~ (?(DEFINE) (?<words> class | string | function ) )

(?: \[code] | \G(?<!^) )
(?: [^[]+? | \[(?!/code]) )*? \K
\b \g<words> \b

~x
LOD;

preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);

print_r($matches[0]);

pattern details:

First at all we define a named group with all reserved words:

(?(DEFINE) (?<words> class | string | function ) )

The (?(DEFINE)...) syntax allows to define subpatterns out of the pattern itself. You can call the named group "words" later in the pattern with \g<words>.

(?: [^[]+? | \[(?!/code]) )*? describes all the content before a reserved word. This subpattern can match all except the closing tag [/code] because you have the choice between "all that is not a [" or "a [ not followed by /code". Since it can match all, lazy quantifiers are used to stop the match when a reserved word is encountered.

The entry point of the pattern is (?: \[code] | \G(?<!^) ). This enforce the match to begin with a [code] tag or to be contiguous to a precedent match.

(\G is an anchor that means: "at the start of the string or contiguous to a precedent match". With the negative lookbehind (?<!^), you forbid the start of the string.)

\K is a trick that resets all the matched content before it from the match result.

Upvotes: 3

worenga
worenga

Reputation: 5856

$str = "someString[code]some code[/code]someString";
$ret = preg_replace('#\[code\](.+)\[\/code\]#iUs', '<FOUND>$1</FOUND>', $str);
var_dump($ret);

(http://www.phpliveregex.com/p/2tD , see preg_match_all example)

You'll might google for BB-Code PHP regex.

Upvotes: 0

Related Questions