Reputation: 11
I have 500 files which are to be read, but reading recursively each file takes 2 minutes approximately. So I want to do this operation in parallel using Perl. How can I do that?
Upvotes: 1
Views: 773
Reputation: 385657
You're talking about a massive amount of reading if takes two minutes. You're basically spending your time waiting for the hard drive. Are the files on separate hard drives? If not, why do you think that trying to get a second file at the same time is going to be faster? In fact, it might make things slower by increasing the amount of seeking the hard drive has to make.
But if you want to try it anyway,
use threads;
use Thread::Queue qw( );
use constant NUM_WORKERS => 4; # Twiddle this
sub run {
my ($qfn) = @_;
...read file $qfn here...
}
my $q = Thread::Queue->new();
my @threads;
for (1..NUM_WORKERS) {
push @threads, async {
while (my $job = $q->dequeue()) {
run($job);
}
};
}
$q->enqueue($_) for @qfns;
$q->enqueue(undef) for @threads;
$_->join() for @threads;
Upvotes: 2
Reputation: 97948
Create a Perl script to process a single fine. Create a shell script, batch-run.sh
, that contains 500 lines (lines like perl perl-script.pl file001
). Then create another shell script that launches required number of background processes to execute lines from batch-run.sh
. You may want to limit the number of background processes though. Something like this:
NCPUS=32 # number of parallel processes
ISCRIPT=batch-run.sh
NTASKS=$(wc -l $ISCRIPT | cut -d' ' -f1)
runbatch() {
OFFSET=$1
while [ $OFFSET -le $NTASKS ]; do
CMD=$(sed "${OFFSET}q;d" $ISCRIPT)
echo "$CMD ..."
eval $CMD
let OFFSET+=$NCPUS
done
}
for i in $(seq 1 $NCPUS); do
runbatch $i &
done
wait
Upvotes: 0