Anusha
Anusha

Reputation: 11

Reading files recursively in parallel in Perl

I have 500 files which are to be read, but reading recursively each file takes 2 minutes approximately. So I want to do this operation in parallel using Perl. How can I do that?

Upvotes: 1

Views: 773

Answers (2)

ikegami
ikegami

Reputation: 385657

You're talking about a massive amount of reading if takes two minutes. You're basically spending your time waiting for the hard drive. Are the files on separate hard drives? If not, why do you think that trying to get a second file at the same time is going to be faster? In fact, it might make things slower by increasing the amount of seeking the hard drive has to make.

But if you want to try it anyway,

use threads;
use Thread::Queue qw( );

use constant NUM_WORKERS => 4;  # Twiddle this

sub run {
   my ($qfn) = @_;
   ...read file $qfn here...
}

my $q = Thread::Queue->new();

my @threads;
for (1..NUM_WORKERS) {
   push @threads, async {
      while (my $job = $q->dequeue()) {
         run($job);
      }
   };
}

$q->enqueue($_) for @qfns;

$q->enqueue(undef) for @threads;
$_->join() for @threads;

Upvotes: 2

perreal
perreal

Reputation: 97948

Create a Perl script to process a single fine. Create a shell script, batch-run.sh, that contains 500 lines (lines like perl perl-script.pl file001). Then create another shell script that launches required number of background processes to execute lines from batch-run.sh. You may want to limit the number of background processes though. Something like this:

NCPUS=32 # number of parallel processes
ISCRIPT=batch-run.sh
NTASKS=$(wc -l $ISCRIPT | cut -d' ' -f1)

runbatch() {
    OFFSET=$1
    while [ $OFFSET -le $NTASKS ]; do
        CMD=$(sed "${OFFSET}q;d" $ISCRIPT)
        echo "$CMD ..."
        eval $CMD
        let OFFSET+=$NCPUS
    done
}

for i in $(seq 1 $NCPUS); do
    runbatch $i &
done
wait

Upvotes: 0

Related Questions