Reputation: 35720
For example, I have a string like this:
{% a %}
{% b %}
{% end %}
{% end %}
I want to get the content between {% a %}
and {% end %}
, which is {% b %} {% end %}
.
I used to use {% \S+ %}(.*){% end %} to do this. But when I add c in it:
{% a %}
{% b %}
{% end %}
{% end %}
{% c %}
{% end %}
It doesn't work... How could I do this with regular expression?
Upvotes: 3
Views: 241
Reputation: 34395
Given this test data:
$text = '
{% a %}
{% b %}
{% a %}
{% end %}
{% end %}
{% b %}
{% end %}
{% end %}
{% c %}
{% end %}
';
This tested script does the trick:
<?php
$re = '/
# Match nested {% a %}{% b %}...{% end %}{% end %} structures.
\{%[ ]\w[ ]%\} # Opening delimiter.
(?: # Group for contents alternatives.
(?R) # Either a nested recursive component,
| # or non-recursive component stuff.
[^{]*+ # {normal*} Zero or more non-{
(?: # Begin: "unrolling-the-loop"
\{ # {special} Allow a { as long
(?! # as it is not the start of
%[ ]\w[ ]%\} # a new nested component, or
| %[ ]end[ ]%\} # the end of this component.
) # Ok to match { followed by
[^{]*+ # more {normal*}. (See: MRE3!)
)*+ # End {(special normal*)*} construct.
)*+ # Zero or more contents alternatives
\{%[ ]end[ ]%\} # Closing delimiter.
/ix';
$count = preg_match_all($re, $text, $m);
if ($count) {
printf("%d Matches:\n", $count);
for ($i = 0; $i < $count; ++$i) {
printf("\nMatch %d:\n%s\n", $i + 1, $m[0][$i]);
}
}
?>
Here is the output:
2 Matches:
Match 1:
{% a %}
{% b %}
{% a %}
{% end %}
{% end %}
{% b %}
{% end %}
{% end %}
Match 2:
{% c %}
{% end %}
Edit: If you need to match an opening tag having more than one word char, replace the two occurrences of the \w
tokens with (?!end)\w++
, (as is correctly implemented in tchrist's excellent answer).
Upvotes: 4
Reputation: 80384
Here is a demo in Perl of an approach that works for your dataset. The same should work in PHP.
#!/usr/bin/env perl
use strict;
use warnings;
my $string = <<'EO_STRING';
{% a %}
{% b %}
{% end %}
{% end %}
{% c %}
{% end %}
EO_STRING
print "MATCH: $&\n" while $string =~ m{
\{ % \s+ (?!end) \w+ \s+ % \}
(?: (?: (?! % \} | % \} ) . ) | (?R) )*
\{ % \s+ end \s+ % \}
}xsg;
When run, that produces this:
MATCH: {% a %}
{% b %}
{% end %}
{% end %}
MATCH: {% c %}
{% end %}
There are several other ways to write that. You may have other constraints that you haven’t shown, but this should get you started.
Upvotes: 2
Reputation: 20899
What you're looking for is called recursive regex. PHP has support for it using (?R)
.
I'm not familiar enough with it to be able to help you with the pattern itself, but hopefully this is a push in the right direction.
Upvotes: 0