Reputation: 3234
I'm trying to match all occurences on config file and store it in perl in hash variable. Here is config file which I'm trying to parse:
# commented area
# still commenting
Key1: Value1
Key2: 2013/03/04 15:41:30
Key3: Value with spaces whatever you pass here fits
Key4:
value5
value6
value7
Key5:
some other multiline value
for testing purpose
I've created this regex which is not fully functional unfortunately. Key4
contains only value5
and Key5
is entirely missing.
Regex:
/^(\w+)\:\s*(.+?)(?=^[^\:]+\:)/smg
Any idea how to improve it?
Upvotes: 1
Views: 104
Reputation:
Am a little late to this.
I think the regex should be a simple and line oriented.
The real logic is in the body of the matching while.
This is something like how I would do it (there are many ways, this is basic).
( ^ [^#:\n]+ ) # (1), Key
: # :
| # or,
(?: ^ [^\S\n]* \# .* \n? ) # BOL comments
| # or,
\# .* # EOL comments
| # or,
( [^#\n]* \n? ) # (2), Line value
Perl test case
$/ = undef;
$str = <DATA>;
$key = undef;
$keyval = "";
%keyhash = ();
@order = ();
while ($str =~ /(^[^#:\n]+):|(?:^[^\S\n]*\#.*\n?)|\#.*|([^#\n]*\n?)/mg)
{
if ( defined $1 ) {
if (defined $key) {
$keyval =~ s/\s+$//;
$keyhash{ $key } = $keyval;
}
($key, $keyval) = ($1,"");
$keyhash{ $key } = "";
push @order, $key;
next;
}
if ( defined $2 && defined $key ) {
$keyval .= $2;
}
}
if ( defined $key ) {
$keyval =~ s/\s+$//;
$keyhash{ $key } = $keyval;
}
foreach $key ( @order ) {
print "'$key' = '$keyhash{$key}'\n";
}
__DATA__
# commented area
Key0:
# still commenting
Key1: Value1
Key2: 2013/03/04 15:41:30 # line end comment
#asfgasfg
stuff
#asfgasfg
here
Key3: Value with spaces whatever you pass here fits
Key4:
value5
value6
value7
Key5:
some other multiline value
for testing purpose
Output >>
'Key0' = ''
'Key1' = ' Value1'
'Key2' = ' 2013/03/04 15:41:30
stuff
here'
'Key3' = ' Value with spaces whatever you pass here fits'
'Key4' = '
value5
value6
value7'
'Key5' = '
some other multiline value
for testing purpose'
Upvotes: 0
Reputation: 24073
This looks like YAML*. You can use YAML::XS (which requires the libyaml
C library) to parse your file and store it in a scalar:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use YAML::XS;
my $yaml = YAML::XS::LoadFile('config.yaml');
print Dumper $yaml;
$VAR1 = {
'Key5' => 'some other multiline value for testing purpose',
'Key2' => '2013/03/04 15:41:30',
'Key1' => 'Value1',
'Key4' => 'value5 value6 value7',
'Key3' => 'Value with spaces whatever you pass here fits'
};
Note that $yaml
is a hash reference.
* Assuming value5
, value6
, and value7
make up a single scalar instead of multiple elements in a sequence. Also assuming your multi-line values contain no tabs, :
, or #
. This is certainly assuming a lot, but may work in your case.
Upvotes: 0
Reputation: 71578
You might try something like this:
^(\w+):\s*(.+?)(?=^[^\n\r:]+:|\z)
I removed the escapes on the colons (:
) and inserted \n\r
in the negated class. (?=^[^\:]+\:)
was being satistied at the end of value5
so the (.+?)
was reluctant to continue matching.
Using the \r\n
inside as well forces the (.+?)
to match until the next line contains ^[^:]+:
.
Then I added |\z
to make the regex match till the end. The problem however with that is it might also capture comments in its wake so maybe something like this if the above doesn't suit you?
^(\w+):\s*((?:(?!^[^\r\n:]+:|^#).)+)
This time, I turned the .+?
into a greedy .+
and added a check on each character match: that the next line is not in the format of a key
(i.e. matches ^[^\r\n:]+:
and the newlines are here again for the same reason as previously mentioned) or a commented line (^#
). A possible issue would be comments in between values or comments not at the start of the line will get into the values.
There should actually be config file parsers out there, which I would believe would be better for this kind of task.
Upvotes: 1