gieldops
gieldops

Reputation: 385

Regex: issue with selecting multiple groups

I need to make 3 groups out of the following text:

[startA]
this is the first group
 [startB]
 blabla
[end]
[end]
[startA]
this is the second group
 [startB]
 blabla
[end]
[end]
[startA]
this is the second group
 [startB]
 blabla
[end]
[end]

As you can see, each group begins with [startA] and ends with [end], it should be easy to make a regex that matches this.
But the problem is that inside a group, the string [end] is used an arbitrary amount of times.
The regex should match a group that starts with [startA] and ends with the [end] right before the next [startA], not a previous [end].

I think it should be done with lookahead but none of my attempts have worked so far.
Is it possible to do this with a regex?

Upvotes: 0

Views: 53

Answers (2)

Ωmega
Ωmega

Reputation: 43673

You should use recursive regex pattern

preg_match_all('/\[(?!end)[^[\]]+\](?:[^[\]]*|[^[\]]*(?R)[^[\]]*)\[end\]\s*/', $s, $m);

See this demo.

Upvotes: 1

raina77ow
raina77ow

Reputation: 106385

Yes, you indeed may solve this with lookahead:

$test_string = <<<TEST
[startA]
this is the first group
 [startB]
 blabla
[end]
[end]
[startA]
this is the second group
 [startB]
 blabla
[end]
[end]
[startA]
this is the third group
 [startB]
 blabla
[end]
[end]
TEST;
preg_match_all('#\[startA](.+?)\[end]\s*(?=\[startA]|$)#s', 
    $test_string, $matches);
var_dump($matches[1]);

Here's ideone demo.

The key is using alternation in lookahead sub-pattern, to test either for the next [startA] section, or the end of the string ($).

Note the /s modififer: without it . meta-character won't match endlines ("\n").

Upvotes: 0

Related Questions