Yash
Yash

Reputation: 3114

How to run a perl script in parallel

How can I run a perl script in parallel with different input params each time:

Illustration:

perl example.pl param1 param2
perl example.pl param3 param4

i want to run the perl script example.pl 2 or more times with different input paramsX. Everytime it should run in parallel.

A sample algo is as under:

my $params='1,2,3,4,5';   
my @all_params = split(/\;/, $params);
foreach my $entry (@all_param)
    {
      perl example.pl $entry
    }

i want to run the perl script in parallel for each loop.

Upvotes: 3

Views: 6758

Answers (2)

Mark Setchell
Mark Setchell

Reputation: 207465

There's no real need to write any code (Perl or otherwise) to run your scripts in parallel, you can just use GNU Parallel and control how many run at time, how many different servers the scripts are run across and where the results go and just about any other aspect.

So, if you have a file called params.txt which contains:

param1 param2
param3 param4

you can just do this in the Terminal:

parallel -a params.txt perl {1} {2}

If you want a progress bar, just add --bar:

parallel --bar ...

If you want to run exactly 8 at a time:

parallel -j 8 ...

If you want to see what it would do without actually doing anything:

parallel --dry-run ...

Upvotes: 7

Sobrique
Sobrique

Reputation: 53478

You're asking about something that seems pretty simple, but is actually altogether more complicated than it seems.

It's not too hard to parallelise in perl, but ... here be dragons. Parallel code introduces a whole new set of bugs and race conditions as your program becomes non deterministic. You can no longer know the sequence of execution reliably. (And if you assume that you do, you'll create a race condition).

But with that in mind - there's really 3 (ish?) ways go go about it.

Fork

Use Parallel::ForkManager and enclose your inner loop in a fork. This works nicely for 'simple' parallelism, but communicating between your forks is difficult.

#!/usr/bin/env perl

use strict;
use warnings;

use Parallel::ForkManager;

my $manager = Parallel::ForkManager->new(2);    #2 concurrent

my $params = '1,2,3,4,5';
my @all_params = split( /,/, $params );

foreach my $entry (@all_param) {
   $manager->start and next;
   #your code to run in parallel here;
   print $entry;
   $manager->finish;
}

You can just roll your own using fork but you're probably going to trip over doing that. So Parallel::ForkManager is the tool for the job.

Thread:

#!/usr/bin/env perl

use strict;
use warnings;

use threads;
use Thread::Queue

  my $work_q = Thread::Queue->new;

sub worker {
   while ( my $item = $work_q->dequeue ) {
      print $item, "\n";
   }
}

my $params = '1,2,3,4,5';
my @all_params = split( /,/, $params );
$work_q->enqueue(@all_params);
$work_q->end;

threads->create( \&worker ) for 1 .. 2;    #2 in parallel
foreach my $thr ( threads->list ) {
   $thr->join;
}

This is more suitable if you need to do more IPC - threading is (IMO) generally better for that. However, you shouldn't treat threads as lightweight (like forks) because despite what you may think from other languages - perl threading doesn't work like that.

Using IO::Select and multiple open calls to parallelise:

#!/usr/bin/env perl

use strict;
use warnings;

use IO::Select; 

my $params = '1,2,3,4,5';
my @all_params = split( /,/, $params );

foreach my $param ( @all_params ) { 
   open ( my $io, '-|', "program_name $param" ); 
   $select -> add ( $io ); 
}

while ( my $fh = $select -> can_read ) { 
   my $line = <$fh>;
   print $line; 
}      

You can do something similar via IPC::Run2 to open file descriptors for STDIN and STDERR.

Should I?

Parallel code isn't a magic bullet. What it does is reduce 'blocks' and lets you consume resources. If your limiting resource is CPU, and you have 10 CPUs, then using 10 in parallel is going to speed you up.

... but if your limiting resource is IO - network or disk bandwidth - it often doesn't help, because contention actually makes the problem worse. Disk controllers in particular already parallelise, prefetch and cache quite efficiently, so your gains from hitting them in parallel are often quite marginal.

Upvotes: 10

Related Questions