Reputation: 185
I have a key,value pair string where the pairs are comma separated but the values have nested commas.
ex.
"key1|val1,key2|val2_a,val2_b,val2_c,key3|val3"
I want to break this out into a hash and have hacked it with the following:
my $str = "key1|val1,key2|val2_a,val2_b,val2_c,key3|val3";
my @vars = split(/([^,\s]+)\|/ ,$str);
my @arr = splice @vars, 1;
my %hash = @arr;
print Dumper(\%hash);
which gives me:
$VAR1 = {
'key2' => 'val2_a,val2_b,val2_c,',
'key1' => 'val1,',
'key3' => 'val3'
};
I'm looking for a more elegant way of doing this. I figure it can be done with one regex but I'm having trouble figuring it out. Can anyone point me in the right direction?
Upvotes: 1
Views: 239
Reputation: 126722
The difficulty comes in deciding where the list of values ends for each key. Most obviously it can be at the end of the string or, more obscurely, where another key|value pair starts. (This is a dreadful design. Can it be fixed before you find anything more difficult to solve?)
This solution works by defining a regex for a "key" string (anything but a pipe, comma, or whitespace) and then using it to build the pattern for a complete key|value pair which ends either at the end of a string or where another "comma - key - pipe" sequence begins.
use strict;
use warnings;
my $s = 'key1|val1,key2|val2_a,val2_b,val2_c,key3|val3';
my $key_re = qr/ [^|,\s]+ /x;
my @pairs = $s =~ / $key_re \| [^|\s]+ (?= \z | , $key_re \| )/gx;
print "$_\n" for @pairs;
output
key1|val1
key2|val2_a,val2_b,val2_c
key3|val3
Upvotes: 1
Reputation: 5129
Try using a positive lookahead in the split.
#!/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $str = "key1|val1,key2|val2_a,val2_b,val2_c,key3|val3";
my %hash = split(/\||,(?=\w+\|)/, $str);
print Dumper(\%hash);
Output:
$VAR1 = {
'key2' => 'val2_a,val2_b,val2_c',
'key1' => 'val1',
'key3' => 'val3'
};
I've found that this tutorial does a good job explaining look arounds.
Upvotes: 4