How to run a perl script in parallel

Question

How can I run a perl script in parallel with different input params each time:

Illustration:

perl example.pl param1 param2
perl example.pl param3 param4

i want to run the perl script example.pl 2 or more times with different input paramsX. Everytime it should run in parallel.

A sample algo is as under:

my $params='1,2,3,4,5';   
my @all_params = split(/\;/, $params);
foreach my $entry (@all_param)
    {
      perl example.pl $entry
    }

i want to run the perl script in parallel for each loop.

Sobrique · Accepted Answer

You're asking about something that seems pretty simple, but is actually altogether more complicated than it seems.

It's not too hard to parallelise in perl, but ... here be dragons. Parallel code introduces a whole new set of bugs and race conditions as your program becomes non deterministic. You can no longer know the sequence of execution reliably. (And if you assume that you do, you'll create a race condition).

But with that in mind - there's really 3 (ish?) ways go go about it.

Fork

Use Parallel::ForkManager and enclose your inner loop in a fork. This works nicely for 'simple' parallelism, but communicating between your forks is difficult.

#!/usr/bin/env perl

use strict;
use warnings;

use Parallel::ForkManager;

my $manager = Parallel::ForkManager->new(2);    #2 concurrent

my $params = '1,2,3,4,5';
my @all_params = split( /,/, $params );

foreach my $entry (@all_param) {
   $manager->start and next;
   #your code to run in parallel here;
   print $entry;
   $manager->finish;
}

You can just roll your own using fork but you're probably going to trip over doing that. So Parallel::ForkManager is the tool for the job.

Thread:

#!/usr/bin/env perl

use strict;
use warnings;

use threads;
use Thread::Queue

  my $work_q = Thread::Queue->new;

sub worker {
   while ( my $item = $work_q->dequeue ) {
      print $item, "
";
   }
}

my $params = '1,2,3,4,5';
my @all_params = split( /,/, $params );
$work_q->enqueue(@all_params);
$work_q->end;

threads->create( \&worker ) for 1 .. 2;    #2 in parallel
foreach my $thr ( threads->list ) {
   $thr->join;
}

This is more suitable if you need to do more IPC - threading is (IMO) generally better for that. However, you shouldn't treat threads as lightweight (like forks) because despite what you may think from other languages - perl threading doesn't work like that.

Using IO::Select and multiple `open` calls to parallelise:

#!/usr/bin/env perl

use strict;
use warnings;

use IO::Select; 

my $params = '1,2,3,4,5';
my @all_params = split( /,/, $params );

foreach my $param ( @all_params ) { 
   open ( my $io, '-|', "program_name $param" ); 
   $select -> add ( $io ); 
}

while ( my $fh = $select -> can_read ) { 
   my $line = <$fh>;
   print $line; 
}

You can do something similar via IPC::Run2 to open file descriptors for STDIN and STDERR.

Should I?

Parallel code isn't a magic bullet. What it does is reduce 'blocks' and lets you consume resources. If your limiting resource is CPU, and you have 10 CPUs, then using 10 in parallel is going to speed you up.

... but if your limiting resource is IO - network or disk bandwidth - it often doesn't help, because contention actually makes the problem worse. Disk controllers in particular already parallelise, prefetch and cache quite efficiently, so your gains from hitting them in parallel are often quite marginal.

How to run a perl script in parallel

Answers (2)

Fork

Thread:

Using IO::Select and multiple `open` calls to parallelise:

Should I?

Related Questions

How to run a perl script in parallel

Answers (2)

Fork

Thread:

Using IO::Select and multiple open calls to parallelise:

Should I?

Related Questions

Using IO::Select and multiple `open` calls to parallelise: