Reputation: 51
I got a string like 1(8()3(6()7())9()3())2(4())3()1(0()3())
which is representing a tree.
A bracket appears, if we go one level deeper. Numbers on the same level are neighbours.
Now want to add nodes, for example I want to add a 5
to every path where we have a 1
on the first and a 3
on the second level, so I want to put a 5()
after every 3(
which is inside of a 1(
.
So 5()
has to be added 3 times and the result should be 1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))
The Problem is, that I don't get the code working with the PCRE recursion. If I match a tree representation string without and fixed paths like 1(
and 3(
it works, but when I generate a regex with those fixed patterns, it doesn't work.
This is my code:
<?php
header('content-type: text/plain;utf-8');
$node = [1, 3, 5];
$path = '1(8()3(6()7())9()3())2(4())3()1(0()3())';
echo $path.'
';
$nes = '\((((?>[^()]+)|(?R))*)\)';
$nes = '('.$nes.')*';
echo preg_match('/'.$nes.'/x', $path) ? 'matches' : 'matches not';
echo '
';
// creates a regex with the fixed path structure, but allows nested elements in between
// in this example something like: /^anyNestedElementsHere 1( anyNestedElementsHere 3( anyNestedElementsHere ))/
$re = $nes;
for ($i = 0; $i < count($node)-1; $i++) {
$re .= $node[$i].'\(';
if ($i != count($node)-2)
$re .= $nes;
}
$re = '/^('.$re.')/x';
echo str_replace($nes, ' '.$nes.' ', $re).'
';
echo preg_match($re, $path) ? 'matches' : 'matches not';
echo '
';
// append 5()
echo preg_replace($re, '${1}'.$node[count($node)-1].'()', $path);
?>
And this is the output, where you can see how the generated regex looks like:
1(8()3(6()7())9()3())2(4())3()1(0()3())
matches
/^( (\((((?>[^()]+)|(?R))*)\))* 1\( (\((((?>[^()]+)|(?R))*)\))* 3\()/x
matches not
1(8()3(6()7())9()3())2(4())3()1(0()3())
I hope you understand my problem and hope you can tell me, where my error is.
Thanks a lot!
Upvotes: 2
Views: 310
Reputation: 6511
Regex
The following regex matches nested brackets recursively, finding an opening 1(
on the first level, and an opening 3(
on the second level (as a direct child). It also attempts successive matches, either on the same level or going down the respective levels to find another match.
~
(?(?=\A) # IF: First match attempt (if at start of string) - -
# we are on 1st level => find next "1("
(?<balanced_brackets>
# consumes balanced brackets recursively where there is no match
[^()]*+
\( (?&balanced_brackets)*? \)
)*?
# match "1(" => enter level 2
1\(
| # ELSE: Successive matches - - - - - - - - - - - - - -
\G # Start at end of last match (level 3)
# Go down to level 2 - match ")"
(?&balanced_brackets)*?
\)
# or go back to level 1 - matching another ")"
(?>
(?&balanced_brackets)*?
\)
# and enter level 2 again
(?&balanced_brackets)*?
1\(
)*?
) # - - - - - - - - - - - -
# we are on level 2 => consume balanced brackets and match "3("
(?&balanced_brackets)*?
3\K\( # also reset the start of the match
~x
Replacement
(5()
Text
Input:
1(8()3(6()7())9()3())2(4())3()1(0()3())
Output:
1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))
^^^ ^^^ ^^^
[1] [2] [3]
We start by using a conditional subpattern
to distinguish between:
\G assertion
).(?(?=\A) # IF followed by start of string
# This is the first attempt
| # ELSE
# This is another attempt
\G # and we'll anchor it to the end of last match
)
For the first match, we'll consume all nested brackets that don't match 1(
, in order to get the cursor to a position in the first level where it could find a successful match.
Recursion
and Subroutines
.(?<balanced_brackets> # ANY NUMBER OF BALANCED BRACKETS
[^()]*+ # match any characters
\( # opening bracket
(?&balanced_brackets)*? # with nested bracket (recursively)
\) # closing bracket in the main level
)*? # Repeated any times (lazy)
Notice this is a named group
that we will use as a subroutine call many times in the pattern to consume unwanted balanced brackets, as (?&balanced_brackets)*?
.
Next levels. Now, to enter level 2, we need to match:
1\(
And finally, we'll consume any balanced brackets until we find the opening of the 3rd level:
(?&balanced_brackets)*?
3\(
That's it. We've just matched our first occurrence, so we can insert the replacement text in that position.
Next match. For the successive match attempts, we can either:
)
to find another occurrence of 3(
)
and, from there, match using the same strategy as we used for the first match.This is achieved with the following subpattern:
\G # anchored to the end of last match (level 3)
(?&balanced_brackets)*? # consume any balanced brackets
\) # go down to level 2
#
(?> # And optionally
(?&balanced_brackets)*? # consume level 2 brackets
\) # to go down to level 1
(?&balanced_brackets)*? # consume level 1 brackets
1\( # and go up to level 2 again
)*? # As many times as it needs to (lazy)
To conclude, we can match the opening of the 3rd level:
(?&balanced_brackets)*?
3\(
We'll also reset the start of match near the end of the pattern, with \K
, to only match the last opening bracket. Thus, we can simply replace with (5()
, avoiding the use of backreferences.
We only need to call preg_replace()
with the same values used above.
Why did your regex fail?
Since you asked, the pattern is anchored to the start of string. It can only match the first occurrence.
/^( (\((((?>[^()]+)|(?R))*)\))* 1\( (\((((?>[^()]+)|(?R))*)\))* 3\()/x
Also, it doesn't match the first occurrence because the construct (?R)
recurses the the whole pattern (trying to match ^
again). We could change (?R)
to (?2)
.
The main reason, though, is because it is not consuming the characters before any opening \(
. For example:
Input:
1(8()3(6()7())9()3())2(4())3()1(0()3())
^
#this "8" can't be consumed with the pattern
There's also a behaviour that should be considered: PCRE treats recursion as atomic. So you have to make sure that the pattern will consume unwanted brackets as in the above example, but also avoid matching 1(
or 3(
in their respective levels.
Upvotes: 2
Reputation: 51330
I'd break down this problem into two smaller parts:
First, extract the 1
nodes, using the following regex:
(?(DEFINE)
(?<tree>
(?: \d+ \( (?&tree) \) )*
)
)
\b 1 \( (?&tree) \)
Use preg_replace_callback
for this. This will match 1(8()3(6()7())9()3())
and 1(0()3())
.
Next, it's just a matter of replacing 3(
with 3(5()
and you're done.
Example in PHP:
$path = '1(8()3(6()7())9()3())2(4())3()1(0()3())';
$path = preg_replace_callback('#
(?(DEFINE)
(?<tree>
(?: \d+ \( (?&tree) \) )*
)
)
\b 1 \( (?&tree) \)
#x', function($m) {
return str_replace('3(', '3(5()', $m[0]);
}, $path);
The result is: 1(8()3(5()6()7())9()3(5()))2(4())3()1(0()3(5()))
Upvotes: 1