Reputation: 936
I have this in a function which is supposed to replace any sequence of parentheses with what is enclosed in it like (abc)
becomes abc
any where it appears even recursively because parens can be nested.
$return = preg_replace_callback(
'|(\((.+)\))+|',
function ($matches) {
return $matches[2];
},
$s
);
when the above regex is fed this string "a(bcdefghijkl(mno)p)q"
as input it returns "ap)onm(lkjihgfedcbq"
. This shows the regex is matched once. What can I do to make it continue to match even inside already made matches and produce this `abcdefghijklmnopq'"
Upvotes: 2
Views: 825
Reputation: 626728
To match balanced parenthetical substrings you may use a well-known \((?:[^()]++|(?R))*\)
pattern (described in Matching Balanced Constructs), inside a preg_replace_callback
method, where the match value can be further manipulated (just remove all (
and )
symbols from the match that is easy to do even without a regex:
$re = '/\((?:[^()]++|(?R))*\)/';
$str = 'a(bcdefghijkl(mno)p)q((('; // Added three ( at the end
$result = preg_replace_callback($re, function($m) {
return str_replace(array('(',')'), '', $m[0]);
}, $str);
echo $result; // => abcdefghijklmnopq(((
See the PHP demo
To get overlapping matches, you need to use a known technique, capturing inside a positive lookahead, but you won't be able to perform two operations at once (replacing and matching), you can run matching first, and then replace:
$re = '/(?=(\((?:[^()]++|(?1))*\)))/';
$str = 'a(bcdefghijkl(mno)p)q(((';
preg_match_all($re, $str, $m);
print_r($m[1]);
// => Array ( [0] => (bcdefghijkl(mno)p) [1] => (mno) )
See the PHP demo.
Upvotes: 2
Reputation: 871
It is slightly unclear exactly what the postcondition of the algorithm is supposed to be. It seems to me that you are wanting to strip out matching pairs of ( )
. The assumption here is that unmatched parentheses are left alone (otherwise you just strip out all of the (
's and )
's).
So I guess this means the input string a(bcdefghijkl(mno)p)q
becomes abcdefghijklmnopq
but the input string a(bcdefghijkl(mno)pq
becomes a(bcdefghijklmnopq
. Likewise an input string (a))
would become a)
.
It may be possible to do this using pcre
since it does provide some non-regular features but I'm doubtful about it. The language of the input strings is not regular; it's context-free. What @ArtisticPhoenix's answer does is match complete pairs of matched parentheses. What it does not do is match all nested pairs. This nested matching is inherently non-regular in my humble understanding of language theory.
I suggest writing a parser to strip out the matching pairs of parentheses. It gets a little wordy having to account for productions that fail to match:
<?php
// Parse the punctuator sub-expression (i.e. anything within ( ... ) ).
function parse_punc(array $tokens,&$iter) {
if (!isset($tokens[$iter])) {
return;
}
$inner = parse_punc_seq($tokens,$iter);
if (!isset($tokens[$iter]) || $tokens[$iter] != ')') {
// Leave unmatched open parentheses alone.
$inner = "($inner";
}
$iter += 1;
return $inner;
}
// Parse a sequence (inside punctuators).
function parse_punc_seq(array $tokens,&$iter) {
if (!isset($tokens[$iter])) {
return;
}
$tok = $tokens[$iter];
if ($tok == ')') {
return;
}
$iter += 1;
if ($tok == '(') {
$tok = parse_punc($tokens,$iter);
}
$tok .= parse_punc_seq($tokens,$iter);
return $tok;
}
// Parse a sequence (outside punctuators).
function parse_seq(array $tokens,&$iter) {
if (!isset($tokens[$iter])) {
return;
}
$tok = $tokens[$iter++];
if ($tok == '(') {
$tok = parse_punc($tokens,$iter);
}
$tok .= parse_seq($tokens,$iter);
return $tok;
}
// Wrapper for parser.
function parse(array $tokens) {
$iter = 0;
return strval(parse_seq($tokens,$iter));
}
// Grab input from stdin and run it through the parser.
$str = trim(stream_get_contents(STDIN));
$tokens = preg_split('/([\(\)])/',$str,-1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
var_dump(parse($tokens));
I know this is a lot more code than a regex one-liner but it does solve the problem as I understand it. I'd be interested to know if anyone can solve this problem with a regular expression.
Upvotes: 1
Reputation: 21661
Try this one,
preg_match('/\((?:[^\(\)]*+|(?0))*\)/', $str )
https://regex101.com/r/NsQSla/1
It will match everything inside of the (
)
as long as they are matched pairs.
Example
(abc) (abc (abc))
will have the following matches
Match 1
Full match 0-5 `(abc)`
Match 2
Full match 6-17 `(abc (abc))`
Upvotes: 1