Amelio Vazquez-Reina
Amelio Vazquez-Reina

Reputation: 96284

Splitting a string around outer delimiters, respecting character groups

Say I have a string:

my $string = "foo{a},bar{b}, baz{c,d,e}";

that uses a set of grouping characters to distinguish between two levels:

$grouping_characters = "{}"

I would like to split this string around the "outer" commas (,) respecting everything inside $grouping_characters.

For the example above, the output should be:

my @result = ("foo{a}", "bar{b}", "baz{c,d,e}")

How do I do this in Perl?

Upvotes: 2

Views: 271

Answers (5)

Vijay
Vijay

Reputation: 67221

> echo "foo{a},bar{b}, baz{c,d,e}" | perl -lne 'push @a,/.*?{.*?},?/g;for(@a){print}'
foo{a},
bar{b},
 baz{c,d,e}
>

Upvotes: 0

choroba
choroba

Reputation: 241868

Simple parser:

#!/usr/bin/perl
use warnings;
use strict;

my $string = 'foo{a},bar{b}, baz{c,d,e}';
my @parts;

my $inside;
my $from = 0;
for my $i (0 .. length $string) {

    my $char = substr $string, $i, 1;

    if ('{' eq $char) {
        $inside++;

    } elsif ('}' eq $char) {
        $inside--

    } elsif (',' eq $char and not $inside) {
        push @parts, substr $string, $from, $i - $from;
        $from = $i + 1;
    }
}

push @parts, substr $string, $from;
print "$_\n" for @parts;

Removing the whitespace is left as an exercise for the reader.

Upvotes: 1

DarkAjax
DarkAjax

Reputation: 16223

You could try:

my $string = "foo{a},bar{b}, baz{c,d,e}";

print join(",",split(/,\s*(?=\w+{[a-z,]+})/g,$string));

Upvotes: 1

amon
amon

Reputation: 57600

First: If you want to properly parse some programming language or configuration format, you may want to use an actual parser.

However, your task can be accomplished with regexes. But we don't write a regex to match the commata on which we want to split. Instead, we write a regex that matches the parts of the list:

my $regex = qr/
  \w+           # item can begin with some identifier
  \{ [^\}]* \}  # followed by some stuff in braces
  [,;]          # must end with comma or semicolon
/x;

We can then extract the matches like

my $string = "foo{a},bar{b}, baz{c,d,e};";
my @result = $string =~ /$regex/g;
dd @result; # using dd from Data::Dump

Output:

("foo{a},", "bar{b},", "baz{c,d,e};")

Pretty nice. Now, we refine our regex in two ways:

  1. The comma isn't part of the matched string
  2. We make sure the matches are adjacent and that no garbage is in between
  3. We make the delimiters pluggable in the most trivial way: We interpolate some values into a charclass.

Together:

my $delims = quotemeta "{}";
my $regex = qr/
    \w+
    [$delims] [^$delims]* [$delims]
/x;

my $string = "foo{a},bar{b}, baz{c,d,e};";
my @result = $string =~ /\G ($regex) [,;] \s*/xg;
dd @result;

The \G assertion anchores where the previous match left off.

Output:

("foo{a}", "bar{b}", "baz{c,d,e}")

Wonderful. This could be refined further in two ways:

  1. The stuff in the braces is allowed to recurse
  2. We differentiate between opening and closing delims, and only allow correct pairs. As it is, foo}a{ would be a valid item….

If all of this isn't needed, the current regex should do fine.

Upvotes: 3

Alexej Magura
Alexej Magura

Reputation: 5119

Try using this regex:

(.*[}]),\s*(.*[}]),\s*(.*[{].*[}])

like so:

my $string = "foo{a},bar{b}, baz{c,d,e}";

print grep(/(.*[}]),\s*(.*[}]),\s*(.*[{].*[}])/, $string);

Upvotes: 1

Related Questions