Reputation: 4541
My situation requires recursion, and I'm able to match what's in the curly brackets already the way I need it, but I'm unable to capture the surrounding text.
So this would be the example text:
This is foo {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} more_text {{foo
And I need my result to look like this:
0 => This is foo
1 => {{foo}}
2 => and
3 => {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}}
4 => more_text {{foo
With this: (\{\{([^{{}}]|(?R))*\}\})
I have been able to match {{foo}}
and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}}
very nicely, but not the surrounding text to achieve the result that I need.
I have tried many things, but without success.
Upvotes: 1
Views: 47
Reputation: 627507
You may use the following solution based on the preg_split
and PREG_SPLIT_DELIM_CAPTURE
flag:
$re = '/({{(?:[^{}]++|(?R))*}})/';
$str = 'This is foo {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} more_text {{foo';
$res = preg_split($re, $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($res);
// => Array
(
[0] => This is foo
[1] => {{foo}}
[2] => and
[3] => {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}}
[4] => more_text {{foo
)
See the PHP demo.
The whole pattern is captured with the outer capturing group, that is why when adding PREG_SPLIT_DELIM_CAPTURE
this text (that is split upon) is added to the output array.
If there are unwanted empty elements, PREG_SPLIT_NO_EMPTY
flag will discard them.
More details:
Pattern: I removed unnecessary escapes and symbols from your pattern as you do not have to escape {
and }
in PHP regex when the context is enough for the rege engine to deduce the {
meaning you do not need to escape }
at all in all contexts). Note that [{}]
is the same as [{{}}]
, both will match a single char that is either a {
or }
, no matter how many {
and }
you put into the character class. I also enhanced its performance by turning the +
greedy quantifier into a possessive quantifier ++
.
Details:
(
- Group 1 start:
{{
- 2 consecutive {
s(?:[^{}]++|(?R))*
- 0 or more sequences of:
[^{}]++
- 1 or more symbols other than {
and }
(no backtracking into this pattern is allowed)|
- or(?R)
- try matching the whole pattern}}
- a }}
substring)
- Group 1 end.PHP part:
When tokenizing a string using just one token type, it is easy to use a splitting approach. Since preg_split
in PHP can split on a regex while keeping the text that is matched, it is ideal for this kind of task.
The only trouble is that empty entries might crawl into the resulting array if the matches appear to be consecutive or at the start/end of the string. Thus, PREG_SPLIT_NO_EMPTY
is good to use here.
Upvotes: 1
Reputation: 21681
I would use a pattern like this
$patt = '/(?P<open>\{\{)|(?P<body>[-0-9a-zA-Z._]+)|(?P<whitespace>\s+)|(?<opperators>and|or|==)|(?P<close>\}\})/'
preg_match_all( $patt, $text, $matches );
The output is far to long but you can loop over it and then match items up, basically it's tokeninzing the string.
Its like this
array (
0 =>
array (
0 => '{{',
1 => 'bar.function',
2 => '{{',
3 => 'demo.funtion',
4 => '{{',
5 => 'inner',
6 => '}}',
7 => ' ',
8 => '==',
9 => ' ',
10 => 'demo',
11 => '}}',
12 => ' ',
13 => 'and',
14 => ' ',
15 => '{{',
16 => 'bar',
17 => '}}',
18 => ' ',
19 => 'or',
20 => ' ',
21 => 'foo',
22 => '}}',
),
'open' =>
array (
0 => '{{',
1 => '',
2 => '{{',
3 => '',
4 => '{{',
5 => '',
6 => '',
7 => '',
8 => '',
9 => '',
10 => '',
11 => '',
12 => '',
13 => '',
14 => '',
15 => '{{',
16 => '',
17 => '',
18 => '',
19 => '',
20 => '',
21 => '',
22 => '',
),
),
'body' =>
array (
0 => '',
1 => 'bar.function',
2 => '',
3 => 'demo.funtion',
4 => '',
5 => 'inner',
6 => '',
....
)
)
Then in a loop you can tell match [0][0]
is open
tag, match [0][1]
is body
match [0][3]
is another open
etc. and by keeping track of open and close tags you can work out the nesting. It will tell you what is an open match body match close match operator match etc...
Every thing you need, I don't have time for a full workup on a solution...
A quick example would be an open
followed by a body
followed by a close
is a variable. And an open
followed by and body
and another open
is a function.
p
You can also add additional patterns by inserting like this (?P<function>function\.)
with the pipe in there like '/(?P<open>\{\{)|(?P<function>function\.)|...
. Then you could pick up keywords like function
foreach
block
etc... what have you.
I've written full fledged template systems with this method. In my template system I build the RegX in an array like this
[ 'open' => '\{\{', 'function' => 'function\.', .... ]
And then compress it to the actual regx, makes life easy...
$r = [];
foreach( $patt_array as $key=>$value ){
$r[] = '(?P<'.$key.'>'.$value.')';
}
$patt = '/'.implode('|', $r ).'/';
Etc...
If you follow.
Upvotes: 1