Reputation: 100
I have a comma separated string and I want to match every comma that is not in parenthesis (parenthesis are guaranteed to be balanced).
a , (b) , (d$_,c) , ((,),d,(,))
The commas between a and (b), (b) and (d$,c), (d$,c) and ((,),d,(,)) should match but not inside (d$_,c) or ((,),d,(,)).
Note: Eventually I want to split the string by these commas.
It tried this regex:
(?!<(?:\(|\[)[^)\]]+),(?![^(\[]+(?:\)|\]))
from here but it only works for non-nested parenthesis.
Upvotes: 2
Views: 498
Reputation: 9231
A single regex for this is massively overcomplicated and difficult to maintain or extend. Here is an iterative parser approach:
use strict;
use warnings;
my $str = 'a , (b) , (d$_,c) , ((,),d,(,))';
my $nesting = 0;
my $buffer = '';
my @vals;
while ($str =~ m/\G([,()]|[^,()]+)/g) {
my $token = $1;
if ($token eq ',' and !$nesting) {
push @vals, $buffer;
$buffer = '';
} else {
$buffer .= $token;
if ($token eq '(') {
$nesting++;
} elsif ($token eq ')') {
$nesting--;
}
}
}
push @vals, $buffer if length $buffer;
print "$_\n" for @vals;
You can use Parser::MGC to construct this sort of parser more abstractly.
Upvotes: 1
Reputation: 626758
You may use
(\((?:[^()]++|(?1))*\))(*SKIP)(*F)|,
See the regex demo
Details
(\((?:[^()]++|(?1))*\))
- Capturing group 1: matches a substring between balanced parentheses:
\(
- a (
char(?:[^()]++|(?1))*
- zero or more occurrences of 1+ chars other than (
and )
or the whole Group 1 pattern (due to the regex subroutine (?1)
that is necessary here since only a part of the whole regex pattern is recursed)\)
- a )
char.(*SKIP)(*F)
- omits the found match and starts the next search from the end of the match|
- or,
- matches a comma outside nested parentheses.Upvotes: 5