Reputation: 3114
How can I run a perl script in parallel with different input params each time:
Illustration:
perl example.pl param1 param2
perl example.pl param3 param4
i want to run the perl script example.pl
2 or more times with different input paramsX
. Everytime it should run in parallel.
A sample algo is as under:
my $params='1,2,3,4,5';
my @all_params = split(/\;/, $params);
foreach my $entry (@all_param)
{
perl example.pl $entry
}
i want to run the perl script in parallel for each loop.
Upvotes: 3
Views: 6758
Reputation: 207465
There's no real need to write any code (Perl or otherwise) to run your scripts in parallel, you can just use GNU Parallel and control how many run at time, how many different servers the scripts are run across and where the results go and just about any other aspect.
So, if you have a file called params.txt
which contains:
param1 param2
param3 param4
you can just do this in the Terminal:
parallel -a params.txt perl {1} {2}
If you want a progress bar, just add --bar
:
parallel --bar ...
If you want to run exactly 8 at a time:
parallel -j 8 ...
If you want to see what it would do without actually doing anything:
parallel --dry-run ...
Upvotes: 7
Reputation: 53478
You're asking about something that seems pretty simple, but is actually altogether more complicated than it seems.
It's not too hard to parallelise in perl, but ... here be dragons. Parallel code introduces a whole new set of bugs and race conditions as your program becomes non deterministic. You can no longer know the sequence of execution reliably. (And if you assume that you do, you'll create a race condition).
But with that in mind - there's really 3 (ish?) ways go go about it.
Use Parallel::ForkManager
and enclose your inner loop in a fork. This works nicely for 'simple' parallelism, but communicating between your forks is difficult.
#!/usr/bin/env perl
use strict;
use warnings;
use Parallel::ForkManager;
my $manager = Parallel::ForkManager->new(2); #2 concurrent
my $params = '1,2,3,4,5';
my @all_params = split( /,/, $params );
foreach my $entry (@all_param) {
$manager->start and next;
#your code to run in parallel here;
print $entry;
$manager->finish;
}
You can just roll your own using fork
but you're probably going to trip over doing that. So Parallel::ForkManager
is the tool for the job.
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Thread::Queue
my $work_q = Thread::Queue->new;
sub worker {
while ( my $item = $work_q->dequeue ) {
print $item, "\n";
}
}
my $params = '1,2,3,4,5';
my @all_params = split( /,/, $params );
$work_q->enqueue(@all_params);
$work_q->end;
threads->create( \&worker ) for 1 .. 2; #2 in parallel
foreach my $thr ( threads->list ) {
$thr->join;
}
This is more suitable if you need to do more IPC - threading is (IMO) generally better for that. However, you shouldn't treat threads as lightweight (like forks) because despite what you may think from other languages - perl threading doesn't work like that.
open
calls to parallelise:#!/usr/bin/env perl
use strict;
use warnings;
use IO::Select;
my $params = '1,2,3,4,5';
my @all_params = split( /,/, $params );
foreach my $param ( @all_params ) {
open ( my $io, '-|', "program_name $param" );
$select -> add ( $io );
}
while ( my $fh = $select -> can_read ) {
my $line = <$fh>;
print $line;
}
You can do something similar via IPC::Run2
to open file descriptors for STDIN and STDERR.
Parallel code isn't a magic bullet. What it does is reduce 'blocks' and lets you consume resources. If your limiting resource is CPU, and you have 10 CPUs, then using 10 in parallel is going to speed you up.
... but if your limiting resource is IO - network or disk bandwidth - it often doesn't help, because contention actually makes the problem worse. Disk controllers in particular already parallelise, prefetch and cache quite efficiently, so your gains from hitting them in parallel are often quite marginal.
Upvotes: 10