I have to write Perl although I'm much more comfortable with Java, Python and functional languages. I'd like to know if there's some idiomatic way to parse a simple file like # comment line - ignore # ignore also empty lines key1 = value key2 = value1, value2, value3 I want a function that I pass an iterator over the lines of the files and that returns a map from keys to list of values. But to be functional and structured I'd like to: use a filter that wraps the given iterator and returns an iterator without empty lines or comment lines The mentioned filter(s) should be defined outside of the function for reusability by other functions. use another function that is given the line and returns a tuple of key and values string use another function that breaks the comma separated values into a list of values. What is the most modern, idiomatic, cleanest and still functional way to do this? The different parts of the code should be separately testable and reusable. For reference, here is (a quick hack) how I might do it in Python: re_is_comment_line = re.compile(r"^\s*#") re_key_values = re.compile(r"^\s*(\w+)\s*=\s*(.*)$") re_splitter = re.compile(r"\s*,\s*") is_interesting_line = lambda line: not ("" == line or re_is_comment_line.match(line)) and re_key_values.match(line) def parse(lines): interesting_lines = ifilter(is_interesting_line, imap(strip, lines)) key_values = imap(lambda x: re_key_values.match(x).groups(), interesting_lines) splitted_values = imap(lambda (k,v): (k, re_splitter.split(v)), key_values) return dict(splitted_values)

Reputation: 2941

functional Perl: Filter, Iterator

I have to write Perl although I'm much more comfortable with Java, Python and functional languages. I'd like to know if there's some idiomatic way to parse a simple file like

# comment line - ignore

# ignore also empty lines
key1 = value
key2 = value1, value2, value3

I want a function that I pass an iterator over the lines of the files and that returns a map from keys to list of values. But to be functional and structured I'd like to:

use a filter that wraps the given iterator and returns an iterator without empty lines or comment lines
The mentioned filter(s) should be defined outside of the function for reusability by other functions.
use another function that is given the line and returns a tuple of key and values string
use another function that breaks the comma separated values into a list of values.

What is the most modern, idiomatic, cleanest and still functional way to do this? The different parts of the code should be separately testable and reusable.

For reference, here is (a quick hack) how I might do it in Python:

re_is_comment_line = re.compile(r"^\s*#")
re_key_values = re.compile(r"^\s*(\w+)\s*=\s*(.*)$")
re_splitter = re.compile(r"\s*,\s*")
is_interesting_line = lambda line: not ("" == line or re_is_comment_line.match(line))
                                   and re_key_values.match(line)

def parse(lines):
    interesting_lines = ifilter(is_interesting_line, imap(strip, lines))
    key_values = imap(lambda x: re_key_values.match(x).groups(), interesting_lines)
    splitted_values = imap(lambda (k,v): (k, re_splitter.split(v)), key_values)
    return dict(splitted_values)

Upvotes: 2

Answers (4)

user1760725

Reputation:

As indicated in the posts linked to by @collapsar, Higher-Order Perl is a great read for exploring functional techniques in Perl.

Here is an example that hits your bullet points:

use strict;
use warnings;
use Data::Dumper;

my @filt_rx = ( qr{^\s*\#},
                qr{^[\r\n]+$} );
my $kv_rx = qr{^\s*(\w+)\s*=\s*([^\r\n]*)};
my $spl_rx = qr{\s*,\s*};

my $iterator = sub {
    my ($fh) = @_;
    return sub {
        my $line = readline($fh);
        return $line;
    };
};
my $filter = sub {
    my ($it,@r) = @_;
    return sub {
        my $line;
        do {
            $line = $it->();
        } while (  defined $line
                && grep { $line =~ m/$_/} @r );
        return $line;
    };
};
my $kv = sub {
    my ($line,$rx) = @_;
    return ($line =~ m/$rx/);
};
my $spl = sub {
    my ($values,$rx) = @_;
    return split $rx, $values;
};

my $it = $iterator->( \*DATA );
my $f = $filter->($it,@filt_rx);

my %map;
while ( my $line = $f->() ) {
    my ($k,$v) = $kv->($line,$kv_rx);
    $map{$k} = [ $spl->($v,$spl_rx) ];
}
print Dumper \%map;

__DATA__
# comment line - ignore

# ignore also empty lines
key1 = value
key2 = value1, value2, value3

It produces the following hash on the provided input:

$VAR1 = {
          'key2' => [
                      'value1',
                      'value2',
                      'value3'
                    ],
          'key1' => [
                      'value'
                    ]
        };

Upvotes: 2

collapsar

Reputation: 17248

you might be interested in this SO question as well as this one.

the following code is a self-contained perl script destined to give you an idea of how to implement in perl (only partially in a functional style; in case you don't revulse seeing the particular coding style and/or language construct, i can refine the solution somewhat).

Miguel Prz is right that in most cases you'd search CPAN for solutions to match your requirements.

my (
      $is_interesting_line
    , $re_is_comment_line
    , $re_key_values
    , $re_splitter
);

$re_is_comment_line = qr(^\s*#);
$re_key_values      = qr(^\s*(\w+)\s*=\s*(.*)$);
$re_splitter        = qr(\s*,\s*);
$is_interesting_line = sub {
        my $line = shift;
        return (
                (!(
                        !defined($line)
                     || ($line eq '')
                ))
            &&  ($line =~ /$re_key_values/)
        );
    };

sub strip {
    my $line = shift;
    # your implementation goes here
    return $line;
}
sub parse {
    my @lines = @_;
    #
    my (
          $dict
        , $interesting_lines
        , $k
        , $v
    );
    #
    @$interesting_lines =
        grep {
                &{$is_interesting_line} ( $_ );
            } ( map { strip($_); } @lines )
    ;

    $dict = {};
    map {
        if ($_ =~ /$re_key_values/) {
            ($k, $v) = ($1, [split(/$re_splitter/, $2)]);
            $$dict{$k} = $v;
        }
    } @$interesting_lines;

    return $dict;
} # parse

#
# sample execution goes here
#    
my $parse =<<EOL;
# comment
what = is, this, you, wonder
it = is, perl
EOL

parse ( split (/[\r\n]+/, $parse) );

Upvotes: 0

amon

Reputation: 57640

A direct translation of your Python would be

my $re_is_comment_line = qr/^\s*#/;
my $re_key_values      = qr/^\s*(\w+)\s*=\s*(.*)$/;
my $re_splitter        = qr/\s*,\s*/;
my $is_interesting_line= sub {
  my $_ = shift;
  length($_) and not /$re_is_comment_line/ and /$re_key_values/;
};

sub parse {
  my @lines = @_;
  my @interesting_lines = grep $is_interesting_line->($_), @lines;
  my @key_values = map [/$re_key_values/], @interesting_lines;
  my %splitted_values = map { $_->[0], [split $re_splitter, $_->[1]] } @key_values;
  return %splitted_values;
}

Differences are:

ifilter is called grep, and can take an expression instead of a block as first argument. These are roughly equivalent to a lambda. The current item is given in the $_ variable. The same applies to map.
Perl doesn't emphazise laziness, and seldomly uses iterators. There are instances where this is required, but usually the whole list is evaluated at once.

In the next example, the following will be added:

Regexes don't have to be precompiled, Perl is very good with regex optimizations.
Instead of extracting key/values with regexes, we use split. It takes an optional third argument that limits the number of resulting fragments.
The whole map/filter stuff can be written in one expression. This doesn't make it more efficient, but emphazises the flow of data. Read the map-map-grep from bottom upwards (actually right to left, think of APL).

sub parse {
  my %splitted_values =
    map { $_->[0], [split /\s*,\s*/, $_->[1]] }
    map {[split /\s*=\s*/, $_, 2]}
    grep{ length and !/^\s*#/ and /^\s*\w+\s*=\s*\S/ }
    @_;
  return \%splitted_values; # returning a reference improves efficiency
}

But I think a more elegant solution here is to use a traditional loop:

sub parse {
  my %splitted_values;
  LINE: for (@_) {
    next LINE if !length or /^\s*#/;
    s/\A\s*|\s*\z//g; # Trimming the string—omitted in previous examples
    my ($key, $vals) = split /\s*=\s*/, $_, 2;
    defined $vals or next LINE; # check if $vals was assigned
    @{ $splitted_values{$key} } = split /\s*,\s*/, $vals; # Automatically create array in $splitted_values{$key}
  }
  return \%splitted_values
}

If we decide to pass a filehandle instead, the loop would be replaced with

my $fh = shift;
LOOP: while (<$fh>) {
  chomp;
  ...;
}

which would use an actual iterator.

You could now go and add function parameters, but do this only iff you are optimizing for flexibility and nothing else. I already used a code reference in the first example. You can invoke them with the $code->(@args) syntax.

use Carp; # Error handling for writing APIs
sub parse {
  my $args = shift;
  my $interesting  = $args->{interesting}   or croak qq("interesting" callback required);
  my $kv_splitter  = $args->{kv_splitter}   or croak qq("kv_splitter" callback required);
  my $val_transform= $args->{val_transform} || sub { $_[0] }; # identity by default

  my %splitted_values;
  LINE: for (@_) {
    next LINE unless $interesting->($_);
    s/\A\s*|\s*\z//g;
    my ($key, $vals) = $kv_splitter->($_);
    defined $vals or next LINE;
    $splitted_values{$key} = $val_transform->($vals);
  }
  return \%splitted_values;
}

This could then be called like

my $data = parse {
  interesting   => sub { length($_[0]) and not $_[0] =~ /^\s*#/ },
  kv_splitter   => sub { split /\s*=\s*/, $_[0], 2 },
  val_transform => sub { [ split /\s*,\s*/, $_[0] ] }, # returns anonymous arrayref
}, @lines;

Upvotes: 5

Miguel Prz

Reputation: 13792

I think the most modern approach consists in taking advantage of the CPAN modules. In your example, Config::Properties may helps:

use strict;
use warnings;
use Config::Properties;

my $config = Config::Properties->new(file => 'example.properties') or die $!;
my $value = $config->getProperty('key');

Upvotes: 4

functional Perl: Filter, Iterator

Answers (4)

Related Questions