Reputation: 96284
Say I have a string:
my $string = "foo{a},bar{b}, baz{c,d,e}";
that uses a set of grouping characters to distinguish between two levels:
$grouping_characters = "{}"
I would like to split this string around the "outer" commas (,
) respecting everything inside $grouping_characters
.
For the example above, the output should be:
my @result = ("foo{a}", "bar{b}", "baz{c,d,e}")
How do I do this in Perl?
Upvotes: 2
Views: 271
Reputation: 67221
> echo "foo{a},bar{b}, baz{c,d,e}" | perl -lne 'push @a,/.*?{.*?},?/g;for(@a){print}'
foo{a},
bar{b},
baz{c,d,e}
>
Upvotes: 0
Reputation: 241868
Simple parser:
#!/usr/bin/perl
use warnings;
use strict;
my $string = 'foo{a},bar{b}, baz{c,d,e}';
my @parts;
my $inside;
my $from = 0;
for my $i (0 .. length $string) {
my $char = substr $string, $i, 1;
if ('{' eq $char) {
$inside++;
} elsif ('}' eq $char) {
$inside--
} elsif (',' eq $char and not $inside) {
push @parts, substr $string, $from, $i - $from;
$from = $i + 1;
}
}
push @parts, substr $string, $from;
print "$_\n" for @parts;
Removing the whitespace is left as an exercise for the reader.
Upvotes: 1
Reputation: 16223
You could try:
my $string = "foo{a},bar{b}, baz{c,d,e}";
print join(",",split(/,\s*(?=\w+{[a-z,]+})/g,$string));
Upvotes: 1
Reputation: 57600
First: If you want to properly parse some programming language or configuration format, you may want to use an actual parser.
However, your task can be accomplished with regexes. But we don't write a regex to match the commata on which we want to split. Instead, we write a regex that matches the parts of the list:
my $regex = qr/
\w+ # item can begin with some identifier
\{ [^\}]* \} # followed by some stuff in braces
[,;] # must end with comma or semicolon
/x;
We can then extract the matches like
my $string = "foo{a},bar{b}, baz{c,d,e};";
my @result = $string =~ /$regex/g;
dd @result; # using dd from Data::Dump
Output:
("foo{a},", "bar{b},", "baz{c,d,e};")
Pretty nice. Now, we refine our regex in two ways:
Together:
my $delims = quotemeta "{}";
my $regex = qr/
\w+
[$delims] [^$delims]* [$delims]
/x;
my $string = "foo{a},bar{b}, baz{c,d,e};";
my @result = $string =~ /\G ($regex) [,;] \s*/xg;
dd @result;
The \G
assertion anchores where the previous match left off.
Output:
("foo{a}", "bar{b}", "baz{c,d,e}")
Wonderful. This could be refined further in two ways:
foo}a{
would be a valid item….If all of this isn't needed, the current regex should do fine.
Upvotes: 3
Reputation: 5119
Try using this regex:
(.*[}]),\s*(.*[}]),\s*(.*[{].*[}])
like so:
my $string = "foo{a},bar{b}, baz{c,d,e}";
print grep(/(.*[}]),\s*(.*[}]),\s*(.*[{].*[}])/, $string);
Upvotes: 1