Reputation: 385
I need to make 3 groups out of the following text:
[startA]
this is the first group
[startB]
blabla
[end]
[end]
[startA]
this is the second group
[startB]
blabla
[end]
[end]
[startA]
this is the second group
[startB]
blabla
[end]
[end]
As you can see, each group begins with [startA]
and ends with [end]
, it should be easy to make a regex that matches this.
But the problem is that inside a group, the string [end]
is used an arbitrary amount of times.
The regex should match a group that starts with [startA]
and ends with the [end]
right before the next [startA]
, not a previous [end]
.
I think it should be done with lookahead but none of my attempts have worked so far.
Is it possible to do this with a regex?
Upvotes: 0
Views: 53
Reputation: 43673
You should use recursive regex pattern
preg_match_all('/\[(?!end)[^[\]]+\](?:[^[\]]*|[^[\]]*(?R)[^[\]]*)\[end\]\s*/', $s, $m);
See this demo.
Upvotes: 1
Reputation: 106385
Yes, you indeed may solve this with lookahead:
$test_string = <<<TEST
[startA]
this is the first group
[startB]
blabla
[end]
[end]
[startA]
this is the second group
[startB]
blabla
[end]
[end]
[startA]
this is the third group
[startB]
blabla
[end]
[end]
TEST;
preg_match_all('#\[startA](.+?)\[end]\s*(?=\[startA]|$)#s',
$test_string, $matches);
var_dump($matches[1]);
Here's ideone demo.
The key is using alternation in lookahead sub-pattern, to test either for the next [startA]
section, or the end of the string ($
).
Note the /s
modififer: without it .
meta-character won't match endlines ("\n").
Upvotes: 0