user254173
user254173

Reputation: 152

Pass a slice of hash as argument in Perl

I have data in a hash that looks like this:

my %inputData;
$inputData{'312'} = 'foobar';
$inputData{'112'} = 'qwerty';
$inputData{'232'} = 'test123';
$inputData{'221'} = 'asdfg';

and so forth.

I use forks to analyze the data, I use $n number of forks. The process() function launches a new fork to do the data analysis, like so:

for my $i ( 0 .. $n-1 )
{
    process( ... );
}

How can I pass a hash reference as an argument to the process() function that contains a slice of the %inputData?

For example, should $n = 2, the loop would run two iterations and first iterations would do:

my %hashSlice;
$hashSlice{'312'} = 'foobar';
$hashSlice{'112'} = 'qwerty';
process(\%hashSlice);

and at second iteration do:

my %hashSlice;
$hashSlice{'232'} = 'test123';
$hashSlice{'221'} = 'asdfg';
process(\%hashSlice);

Or, should $n = 3, the loop would run three iterations and first iterations would do:

my %hashSlice;
$hashSlice{'312'} = 'foobar';
$hashSlice{'112'} = 'qwerty';
process(\%hashSlice);

at second iteration do:

my %hashSlice;
$hashSlice{'232'} = 'test123';
process(\%hashSlice);

and at third iteration do:

my %hashSlice;
$hashSlice{'221'} = 'asdfg';
process(\%hashSlice);

Upvotes: 1

Views: 104

Answers (3)

ikegami
ikegami

Reputation: 385764

If the point of this is to split work among workers, a worker pool model that grabs work from a common queue would work better. The Parallel::Manager solution Sobrique is an example of this (though it might be better to reuse the workers).


A simple solution:

my %data       = ...;
my $num_groups = ...;

my @groups;
my $i = 0;
for my $key (keys(%data)) {
   $groups[$i]{$key} = $data{$key};
   $i = ($i + 1) % $num_groups;
}

Probably a bit faster, especially for large inputs.

my %data       = ...;
my $num_groups = ...;

our @keys; local *keys = sub { \@_ }->( keys(%data) );
my $r = @keys % $num_groups;
my $group_size = ( @keys - $r ) / $num_groups;
for my $i (0..$num_groups-1) {
   our @group_keys; local *group_keys = sub { \@_ }->(
      splice(@keys, 0, $group_size + ( $i < $r ? 1 : 0 ))
   );
   my %group;
   @group{@group_keys} = @data{@group_keys};
   push @groups, \%group;
}

Notes:

  1. our @a; local *a = sub { \@_ }->( LIST );
    

    is similar to

    my @a = LIST;
    

    except the elements of @a are the actual scalars returned by LIST, not copies of them.

  2. Since 5.20,

    my %group;
    @group{@group_keys} = @data{@group_keys};
    push @groups, \%group;
    

    can be written

    push @groups, { %data{@group_keys} };
    

Upvotes: 0

Sobrique
Sobrique

Reputation: 53478

Can I suggest that you don't need to do that? Why not instead use something like Parallel::ForkManager and just spawn a new fork for each key - limiting the concurrency seperately.

E.g.

#!/usr/bin/env perl
use strict;
use warnings;
use Parallel::ForkManager;

my $fm = Parallel::ForkManager -> new ( 3 ); 

foreach my $key ( keys %inputData ) {
   $fm -> start and next;
   process ( $inputData{$key} );
   $fm -> finish;
}

$fm -> wait_all_children();

This sets your concurrency limit to 3, but spawns a new fork per element, and lets you trivially scale 'wider' by just changing that concurrency number.

Otherwise I'd be thinking perhaps switching to using threads and feed elements via a Thread::Queue to multiple worker threads.

Upvotes: 3

Borodin
Borodin

Reputation: 126722

You can't create a smaller hash that is a subset of another without building it in some way as you have written

It is probably best to pass the entire hash together with a list of keys to be processed, like this

process( \%input_data, 'foobar', 'qwerty', 'test123')

You could use slices to build your smaller hash, like this

my @keys = ( 'foobar', 'qwerty', 'test123' );
my %subset;
@subset{@keys} = @input_data{@keys};
process(\%subset);

Also, you should avoid capital letters in lexical identifiers. Capitals are reserved for use in global identifiers such as Package::Names, and some serious clashes can happen if you also use them for local variables and subroutines

Upvotes: 1

Related Questions