Spacemoose
Spacemoose

Reputation: 4016

Split into words by an uncommented comma that is not inside matching parentheses

Consider the following string:

blah, foo(a,b), bar(c,d), yo

I want to extract a list of strings:

blah
foo(a,b)
bar(c,d)
yo

It seems to me that I should be able to use quote words here, but I'm struggling with the regex. Can someone help me out?

Upvotes: 2

Views: 78

Answers (3)

serenesat
serenesat

Reputation: 4709

There is a solution given by Borodin for one of your question (which is similar to this question). A small change of regex will give you desire output: (this will not work for nested parentheses)

use strict;
use warnings;
use 5.010;

my $line = q<blah, foo(a,b), bar(c,d), yo>;

my @words = $line =~ / (?: \([^)]*\) | [^,] )+ /xg;

say for @words;

Output:

blah
 foo(a,b)
 bar(c,d)
 yo

Upvotes: 1

lynn
lynn

Reputation: 10784

Perl has a little thing regex recursion, so you might be able to look for:

  • either a bare word like blah containing no parentheses (\w+)

  • a "call", like \w+\((?R)(, *(?R))*\)

The total regex is (\w+(\((?R)(, ?(?R))*\))?), which seems to work.

Upvotes: 3

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You can use the following regex to use in split:

\([^()]*\)(*SKIP)(*F)|\s*,\s*

With \([^()]*\), we match a ( followed with 0 or more characters other than ( or ) and then followed with ). We fail the match with (*SKIP)(*F) if that parenthetical construction is found, and then we only match the comma surrounded with optional whitespaces.

See demo

#!/usr/bin/perl
my $string= "blah, foo(a,b), bar(c,d), yo";
my @string = split /\([^()]*\)(*SKIP)(*F)|\s*,\s*/, $string;

foreach(@string) {
    print "$_\n";
}

To account for commas inside nested balanced parentheses, you can use

my @string = split /\((?>[^()]|(?R))*\)(*SKIP)(*F)|\s*,\s*/, $string;

Here is an IDEONE demo

With \((?>[^()]|(?R))*\) we match all balanced ()s and fail the match if found with the verbs (*SKIP)(*F), and then we match a comma with optional whitespace around (so as not to manually trim the strings later).

For a blah, foo(b, (a,b)), bar(c,d), yo string, the result is:

blah
foo(b, (a,b))
bar(c,d)
yo

Upvotes: 1

Related Questions