Topa
Topa

Reputation: 100

perl regex to get comma not in parenthesis or nested parenthesis

I have a comma separated string and I want to match every comma that is not in parenthesis (parenthesis are guaranteed to be balanced).

a   ,   (b)  ,   (d$_,c)    ,     ((,),d,(,))

The commas between a and (b), (b) and (d$,c), (d$,c) and ((,),d,(,)) should match but not inside (d$_,c) or ((,),d,(,)).

Note: Eventually I want to split the string by these commas.

It tried this regex: (?!<(?:\(|\[)[^)\]]+),(?![^(\[]+(?:\)|\])) from here but it only works for non-nested parenthesis.

Upvotes: 2

Views: 498

Answers (2)

Grinnz
Grinnz

Reputation: 9231

A single regex for this is massively overcomplicated and difficult to maintain or extend. Here is an iterative parser approach:

use strict;
use warnings;

my $str = 'a   ,   (b)  ,   (d$_,c)    ,     ((,),d,(,))';

my $nesting = 0;
my $buffer = '';
my @vals;
while ($str =~ m/\G([,()]|[^,()]+)/g) {
  my $token = $1;
  if ($token eq ',' and !$nesting) {
    push @vals, $buffer;
    $buffer = '';
  } else {
    $buffer .= $token;
    if ($token eq '(') {
      $nesting++;
    } elsif ($token eq ')') {
      $nesting--;
    }
  }
}
push @vals, $buffer if length $buffer;

print "$_\n" for @vals;

You can use Parser::MGC to construct this sort of parser more abstractly.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

You may use

(\((?:[^()]++|(?1))*\))(*SKIP)(*F)|,

See the regex demo

Details

  • (\((?:[^()]++|(?1))*\)) - Capturing group 1: matches a substring between balanced parentheses:
    • \( - a ( char
    • (?:[^()]++|(?1))* - zero or more occurrences of 1+ chars other than ( and ) or the whole Group 1 pattern (due to the regex subroutine (?1) that is necessary here since only a part of the whole regex pattern is recursed)
    • \) - a ) char.
  • (*SKIP)(*F) - omits the found match and starts the next search from the end of the match
  • | - or
  • , - matches a comma outside nested parentheses.

Upvotes: 5

Related Questions