Reputation:
I am trying to figure out how to properly implement ForkManager for a part of a project I am working on, but have run into a situation where FM seems to be spawning processes and doing stuff, but takes forever.
However, when I try FM in debug code (by setting the max processes to 0), the code completes within a reasonable and expected timeframe.
Here is the code I am having trouble with...
use strict;
use warnings;
use Parallel::ForkManager;
sub read_table {
# takes a filename and reads in a CSV file.
# works fine and thus is omitted here
}
sub foo {
# originally an Inline::C subroutine
# for purpose of debugging replaced with randgen
return rand;
}
my $cpu_count = 0; my $epsilon = 1e-16;
my @tt = read_table('tt.csv');
my @tc = read_table('tc.csv');
my @nm = ($epsilon) x scalar @tc;
my @results;
my $pm = new Parallel::ForkManager($cpu_count);
$pm->run_on_finish(sub{
my $i = $_[1]; my $m = $_[5];
$results[$i] = $m;
});
foreach my $i (0..$#tt) {
$pm->start and next;
my @r;
if (scalar @{$tt[$i]} > 1) {
foreach my $j (0..$#tc) {
if (scalar @{$tc[$j]} > 1) {
push @r, foo(scalar @{$tt[$i]}, scalar @{$tc[$j]}, \@{$tt[$i]}, \@{$tc[$j]});
} else {
push @r, $epsilon;
}
}
} else {
@r = @nm;
}
$pm->finish($i, [@r]);
undef @r;
}
$pm->wait_all_children;
So if I set $cpu_count
to 0, the process completes fine without a problem, with the original C code complete in a couple of minutes (with sub foo {return rand;}
only ~ 2 seconds), but when FM is turned on, it would then seem to go on for a long time. It did seem like it was running however when I put in print statements like print "at rows $i and $j"
to diagnose the problem.
The runtime was the same if I took out all FM-related codes and just tried to have regular double foreach loops instead.
Thanks in advance!
Upvotes: 2
Views: 80
Reputation: 385764
It's because your worker does so little that the overhead of creating the process and more significantly, transferring the data back to the parent is larger than the actual work load.
Suggestions:
@tt
to a child, assigning a group of elements.@{$tt[$i]}
is empty in the parent.choroba's solution reduces the overhead, but kept the inefficiencies of the original program. Their solution could be made to be much faster by also implementing my suggestions.
By the way, $pm->finish($i, [@r]);
is better written as $pm->finish($i, \@r);
. No need to create a new array.
Upvotes: 0
Reputation: 241858
That's because data sent from a child to a parent are written to disk (see RETRIEVING DATASTRUCTURES in Parallel::ForkManager):
The data structure referenced in a given child process is serialized and written out to a file by Storable. The file is subsequently read back into memory and a new data structure belonging to the parent process is created. Please consider the performance penalty it can imply, so try to keep the returned structure small.
In debugging mode, no fork happens, so the structure can be passed directly without being saved and loaded.
Thread::Queue might produce better results.
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
sub read_table {
map [ map rand, 1 .. 100 ], 1 .. 100;
}
sub foo {
[ @_ ]
}
my $cpu_count = 20; my $epsilon = 1e-16;
my @tt = read_table('tt.csv');
my @tc = read_table('tc.csv');
my @nm = ($epsilon) x scalar @tc;
my @results;
my ($q_in, $q_out) = map 'Thread::Queue'->new, 1, 2;
my @workers = map threads->create(sub{
while(defined(my $i = $q_in->dequeue)) {
warn $i;
my @r;
if (scalar @{$tt[$i]} > 1) {
for my $j (0 .. $#tc) {
if (scalar @{$tc[$j]} > 1) {
push @r, foo(scalar @{$tt[$i]}, scalar @{$tc[$j]}, \@{$tt[$i]}, \@{$tc[$j]});
} else {
push @r, $epsilon;
}
}
} else {
@r = @nm;
}
$q_out->enqueue([$i, @r]);
}
}), 1 .. $cpu_count;
$q_in->enqueue(0 .. $#tt);
$q_in->end;
for (0 .. $#tt) {
my $r = $q_out->dequeue;
my $i = shift @$r;
warn "$i: $r->[2][2][1]";
}
$_->join for @workers;
Upvotes: 3