Reputation: 1669
I'm working on a project, implemented in Perl, and thought it would be an idea to use threads to distribute the work, because the tasks can be done independent of each other and only reading from shared data in memory. However, the performance is nowhere near as I expect it to be. So after some investigation I can only conclude that threads in Perl basically suck, but I keep wondering the performance goes down the drain as soon as I implement one single shared variable.
For example, this little program has nothing shared and consumes 75% of the CPU (as expected):
use threads;
sub fib {
my ( $n ) = @_;
if ( $n < 2 ) {
return $n;
} else {
return fib( $n - 1 ) + fib( $n - 2 );
}
}
my $thr1 = threads->create( 'fib', 35 );
my $thr2 = threads->create( 'fib', 35 );
my $thr3 = threads->create( 'fib', 35 );
$thr1->join;
$thr2->join;
$thr3->join;
And as soon as I introduce a shared variable $a
, the CPU usage is somewhere between 40% and 50%:
use threads;
use threads::shared;
my $a : shared;
$a = 1000;
sub fib {
my ( $n ) = @_;
if ( $n < 2 ) {
return $n;
} else {
return $a + fib( $n - 1 ) + fib( $n - 2 ); # <-- $a was added here
}
}
my $thr1 = threads->create( 'fib', 35 );
my $thr2 = threads->create( 'fib', 35 );
my $thr3 = threads->create( 'fib', 35 );
$thr1->join;
$thr2->join;
$thr3->join;
So $a
is read-only and no locking takes place, and yet the performance decreases. I'm curious why this happens.
At the moment I'm using Perl 5.10.1 under Cygwin on Windows XP. Unfortunately I couldn't test this on a non-Windows machine with a (hopefully) more recent Perl.
Upvotes: 4
Views: 2601
Reputation: 352
Constructing a shared object containing lots of data is possible in Perl and not worry about extra copies. There is no impact to performance when spawning workers, because the shared data resides inside a separate thread or process, depending on whether using threads.
use MCE::Hobo; # use threads okay or parallel module of your choice
use MCE::Shared;
# The module option constructs the object under the shared-manager.
# There's no trace of data inside the main process. The construction
# returns a shared reference containing an id and class name.
my $data = MCE::Shared->share( { module => 'My::Data' } );
my $b;
sub fib {
my ( $n ) = @_;
if ( $n < 2 ) {
return $n;
} else {
return $b + fib( $n - 1 ) + fib( $n - 2 );
}
}
my @thrs;
push @thrs, MCE::Hobo->create( sub { $b = $data->get_keys(1000), fib(35) } );
push @thrs, MCE::Hobo->create( sub { $b = $data->get_keys(2000), fib(35) } );
push @thrs, MCE::Hobo->create( sub { $b = $data->get_keys(3000), fib(35) } );
$_->join() for @thrs;
exit;
# Populate $self with data. When shared, the data resides under the
# shared-manager thread (via threads->create) or process (via fork).
package My::Data;
sub new {
my $class = shift;
my %self;
%self = map { $_ => $_ } 1000 .. 5000;
bless \%self, $class;
}
# Add any getter methods to suit the application. Supporting multiple
# keys helps reduce the number of trips via IPC. Serialization is
# handled automatically if getter method were to return a hash ref.
# MCE::Shared will use Serial::{Encode,Decode} if available - faster.
sub get_keys {
my $self = shift;
if ( wantarray ) {
return map { $_ => $self->{$_} } @_;
} else {
return $self->{$_[0]};
}
}
1;
Upvotes: 0
Reputation: 182743
Your code is a tight loop around a synchronized structure. Optimize it by having each thread copy the shared variable -- just once for each thread -- into an unshared variable.
Upvotes: 3