Satish Sajjanar
Satish Sajjanar

Reputation: 35

How to use Getopt::Long to store options in complex hash

I'm trying to get options and store them in a hash using Getopt::Long. Below is my code.

#!/bin/perl

use strict;
use feature "say";
use Data::Dumper;
use Getopt::Long;
my %model;

GetOptions(
     "model=s"        => \%model,
);

say Dumper \%model;

Using this method, I'm able to get only single key=value pair but I actually need a complex hash so that each key can store multiple values. Below is my current output.

my_code.pl -model key1=value1 -model key2=value2

output:
$VAR1 = {
          'key2' => 'value2',
          'key1' => 'value1'
        };

What I need is something like below.

$VAR1 = {
          'key2' => {
                      'value3' => 'undef'
                     },
          'key1' => {
                      'value1' => 'undef',
                      'value2' => 'undef'
                     }
        };

Upvotes: 3

Views: 548

Answers (3)

brian d foy
brian d foy

Reputation: 132792

There are two ways that people approach these problems:

  • choose an interface, and force a tool to process it
  • choose a tool, and force its interface on people

I have a chapter in Mastering Perl about various configuration methods, options, and tools. It's one of the things people tend not to teach because it's not syntax or features. However, it's the high touch part of the process for the people that use what you create.

In short, the field is a mess with no standards and few best practices. I like a hybrid approach: use a tool to do most of the processing and adapt its output to the program. This way, I'm not stuck with what the tool allows.

I've always found that the --switch key=value interface is a bit of a mess. And, it looks like you want to be able to duplicate keys so a key can have multiple values.

You have this example:

% my_code.pl -model key1=value1 -model key2=value2
$VAR1 = {
          'key2' => 'value2',
          'key1' => 'value1'
        };

But consider this case, where key1 shows up twice:

% my_code.pl  --model key1=value1 --model key2=value2 --model key2=value3

The last value for key2 wins, and there's not even a warning that one of the values will be ignored:

$VAR1 = {
          'key2' => 'value3',
          'key1' => 'value1'
        };

First, I'd think long and hard about how you really want to get that info into your program (lest you end up like ffmpeg's almost impossible interface). But, let's set that aside.

You're thinking about a hash so you reached for the hash features in Getopt::Long. But, you should uncouple the interface from the internal storage. You don't really want the command line processing tool deciding how your data structures work and you don't want the other way around either. That locks in your choice. It's nice if the tool can give you want you like, but it's better if you don't care what the tool does and as long as you get your result.

Many of my programs start with something like this subroutine call. I don't know what that subroutine call does or how it does it or what it uses to do that. I know what that it returns something my program can use. I can switch out everything and do it completely differently as long as it returns the same thing for the same input:

my $model = process_args( @ARGV );

And, this has a side benefit that I don't have to use just the command line values because I can pass that subroutine any list I like (testing, natch).

So then, what goes in there? In this case, setting aside the interface issue I mentioned earlier, what should I do?

  • I know that -model can show up more than once
  • I expect it to have an argument like key=value
  • I want to be able to specify multiple values for a key

Simplest thing first

So, here's a first go because this is what I tend to do right away (having struggled with and been scarred by this task for decades). I'm going to treat the arguments to --model as single strings and let Getopt::Long process them as an array. I'll post-process them later:

use v5.20;
use experimental qw(signatures);
use Data::Dumper;

my @args = qw(--model key=value --model key=value2 --model key2=valueA );

my $model = process_args( @args );

say Dumper $model;

sub process_args ( @args ) {
    state $rc = require Getopt::Long;

    my %config;

    Getopt::Long::GetOptionsFromArray(
        \@args,
        "model=s@" => $config{model} = [],
        );

    return \%config;
    }

The output shows that I have all the input, so that's a good step:

$VAR1 = {
          'model' => [
                       'key=value',
                       'key=value2',
                       'key2=valueA'
                     ]
        };

Now adapt it for local use

Now I can massage that a bit, doing basically what Dave Cross did in his answer. I do a little processing after Getopt::Long has done it's job. It knows how to break about tokens on the command line, but I know what those tokens mean. Thus, once I have them organized, I'll be the one to interpret them:

use v5.26;  # for postfix deref (->@*)
sub process_args ( @args ) {
    state $rc = require Getopt::Long;

    my %config;

    Getopt::Long::GetOptionsFromArray(
        \@args,
        "model=s@" => $config{model} = [],
        );

    my %hash;
    foreach my $string ( $config{model}->@* ) {
        my( $key, $value ) = split /=/, $string, 2;
        push $hash{$key}->@*, $value;
        }

    $config{model} = \%hash;

    return \%config;
    }

Now I have this data structure, where each key has an array ref of values. That's not quite what you said you wanted, but I also don't know what you are doing with the multi-level hash in which all the values are undef. I think this is easier if you want to get just the value names from the command line:

$VAR1 = {
          'model' => {
                       'key' => [
                                  'value',
                                  'value2'
                                ],
                       'key2' => [
                                   'valueA'
                                 ]
                     }
        };

Make hash of hashes

You might want to fill in your undef values later, but I tend to keep the input data separate from the generated data. For me it helps with logging and reporting. But, whatever. The trick is to make a data structure that is most suited to your task so it's easy to work with.

To get what you showed, it's a one line change. That's an important point and part of the reason I took that route. I don't have to re-engineer everything I told Getopt::Long:

sub process_args ( @args ) {
    ...

    my %hash;
    foreach my $string ( $config{model}->@* ) {
        my( $key, $value ) = split /=/, $string, 2;
        $hash{$key}{$value} = undef;  # single line change
        }

    ....
    }

LordAdmira's answer takes the long way to get to this method of handling it directly in Getopt::Long, where you give each specifier a code reference to further process the result. This is fine as you see it here, but I find his answer quite unwieldy to look at or maintain (although some people will think that about mine, too):

sub process_args ( @args ) {
    state $rc = require Getopt::Long;

    my %config;

    Getopt::Long::GetOptionsFromArray(
        \@args,
        "model=s%" => sub { my($n, $k, $v) = @_; $config{$k}{$v} = undef; }
        );

    return \%config;
    }

Now the interface changes

Let's approach this from another angle. Instead of specifying key2 twice, what if I could do it once and give it multiple values, like this:

% my_code.pl  --model key1=value1 --model key2=value2 --model key2=value3
% my_code.pl  --model key1=value1 --model key2=value2,value3

The change isn't that bad, and again, I don't need to mess with the particular tool I chose to process the command line:

sub process_args ( @args ) {
    ...
    my %hash;
    foreach my $string ( $config{model}->@* ) {
        my( $key, $value ) = split /=/, $string, 2;
        my @values = split /,/, $value;
        $hash{$key}{$_} = undef for @values;
        }

    ...
    }

The output shows I picked up multiple options with the same key, and multiple values in one option:


$ my_code.pl  --model key1=value1 --model key2=value2,value4 --model key2=value3
$VAR1 = {
          'model' => {
                       'key1' => {
                                   'value1' => undef
                                 },
                       'key2' => {
                                   'value2' => undef,
                                   'value3' => undef,
                                   'value4' => undef
                                 }
                     }
        };

One more thing

Now, there's something (well, at least one thing) I ignored. Getopt and friends processing @ARGV and remove whatever they felt belonged to them. There can be additional arguments on the command line that don't belong to options. If that's important to you, you probably want to return the leftover bits of the arguments array:

my( $model, $leftovers ) = process_args( @args );

say Dumper( $model, $leftovers );

sub process_args ( @args ) {
    state $rc = require Getopt::Long;

    ...

    return \%config, \@args;
    }

Upvotes: 4

Dave Cross
Dave Cross

Reputation: 69244

I think you're pushing the module to its limits. But that's ok. That's why the module has the catch-all feature of using a subroutine to process your option.

#!/usr/bin/perl

use strict;
use feature "say";
use Data::Dumper;
use Getopt::Long;
my %model;

sub process_opt {
  my ($name, $val) = @_;
  my ($k, $v) = split /=/, $val, 2;
  $model{$k}{$v} = undef;
}

GetOptions(
     "model=s" => \&process_opt,
);

say Dumper \%model;

Upvotes: 3

lordadmira
lordadmira

Reputation: 1832

By specifying a hash reference as the first argument, all of the options are assigned to that hash. It's a great way to keep all of the options together in one place. The option spec can then be put into a qw list.

You can put a code ref directly in the hash to create an option with multiple effects or to manage your key value pairs. By specifying the s% option type, Getopt::Long will split the value for you and feed it to your code ref.

Example:

use strict;
use diagnostics;
use Getopt::Long;

our %model;
## %options must be declared seperately because it is referenced in its own definition
our %options;
%options = (
  # this code ref receives, the name, key, and value as arguments
  model => sub { my($n, $k, $v) = @_; $model{$k}{$v} = undef; },

  # set default debug level
  debug => 0,
  # set default file name
  file => "file.dat",

  # option with multi-effects, setting debug and verbose at once
  quiet => sub { @options{qw/debug verbose/} = (0, 0); },
  loud  => sub { @options{qw/debug verbose/} = (999, 1); },
);

GetOptions(\%options,
  qw/debug+ verbose! file=s length=o quiet loud model=s%/
);

our $argument = shift @ARGV;
die "missing first argument\n" unless defined $argument;

print "Starting program $0 on $argument\n" if $options{verbose};

if ($options{debug} >= 2) {
  ## Load this module only if we need it, but you must guarantee it's there or trap the error with eval{}
  require Data::Dump;
  printf "Dumping options hash\n%s\n", Data::Dump::pp(\%options);
}

Upvotes: 0

Related Questions