vp8
vp8

Reputation: 155

Which performs better - using grep vs for loop in perl?

Let's say If user wants to filter elements from an array based on a certain condition. User has two choices. Either he can use one line grep statement or write a for loop and store elements based on certain condition.

For e.g.

my @array = (1, 2, 3, 4, 5, 6, 7, 8);

Option 1 : my @filtered = grep { $_/2 eq 0 } @array;

Option 2 :

foreach (@array) 
{ 
  if($_/2 eq 0) { 
    push @filtered, $_;
  } 
}

Which is preferred way of writing if performance is a factor?

Upvotes: 1

Views: 1369

Answers (3)

zdim
zdim

Reputation: 66964

The first option has to be faster in general since grep has specific optimizations.

However, that statement is far too general and you need to benchmark it, in particular under conditions that apply to your work.

An example using Benchmark module

use warnings;
use strict;
use feature 'say';    
use Getopt::Long;
use Benchmark qw(cmpthese);

my $time = 3;
my $size = 10_000;

GetOptions( 'time=i' => \$time, 'size=i' => \$size )  or usage();

sub w_grep {
    my @res = grep { $_/2 > $size/4 } @{$_[0]};
    return \@res;
}

sub w_for {
    my @res;
    for (@{$_[0]}) { 
        push @res, $_ if $_/2 > $size/4; 
    }   
    return \@res;
}

my @t = 1..$size;

cmpthese( -$time, {
    grep => sub { w_grep(\@t) },
    for  => sub { w_for(\@t) },
});

sub usage {
    say STDERR "Usage: $0 [--time NUM] [--size NUM]";
    exit;
}

The condition is true about half of the time so a scalar gets created and added that often. Creating a lot of scalars is expensive and this is likely the main factor, how often your condition is true.

It prints (on a desktop with v5.16 under CentOS 7)

       Rate  for grep
for   948/s   -- -13%
grep 1085/s  14%   --

However, if we make the condition more restrictive, say $_/2 > $size/2, then we get

       Rate  for grep
for  1217/s   --  -6%
grep 1296/s   6%   --

while when it succeeds most of the time, for say $_/2 > 10, the rates are

       Rate  for grep
for   945/s   -- -28%
grep 1304/s  38%   --

This shows that for falls further behind as there is more additions to the array.

But please remember that this depends in many other ways on what choices you make for tests, some rather subtle and some not so subtle.

For example, unpacking the arrayref first (my @ary = @{ $_[0] };) reduces the difference to a mere few percent, since the runtime is dominated by creating a local array. That would throw your tests -- or perhaps reflect your situation. You need to make careful choices for your conditions.

The numbers of course also vary with array sizes, complexity of the calculation, etc.

Upvotes: 5

tdcc
tdcc

Reputation: 125

Option 2 is slightly faster on my machine...

my @array;
for ( my $i = 0; $i < 100000000; $i++ ) {
  push @array, $i;
}


my $start = time;
my @filtered = grep { $_/2 eq 0 } @array;
my $duration = time - $start;
print "Execution time: $duration s\n";

my $start = time;
foreach (@array) 
{ 
  if($_/2 eq 0) {
    push @filtered, $_;
  } 
}
my $duration = time - $start;
print "Execution time: $duration s\n";

Output:

Execution time: 296 s
Execution time: 233 s

Upvotes: 0

haukex
haukex

Reputation: 3013

As @toolic commented, see Benchmark. I changed your condition slightly so that the output array is actually populated with something:

use warnings;
use strict;
use Benchmark qw/cmpthese/;

for my $arraylen (10, 100, 10_000) {
    print "#### arraylen $arraylen ###\n";
    my @array = 1..$arraylen;
    cmpthese( -1, {
        grep => sub {
            my @filtered = grep { $_%2 } @array;
        },
        loop => sub {
            my @filtered;
            foreach (@array) { 
                if ($_%2) { 
                    push @filtered, $_;
                } 
            }
        },
    });
}

Output:

#### arraylen 10 ###
          Rate loop grep
loop 1514033/s   -- -10%
grep 1683494/s  11%   --
#### arraylen 100 ###
         Rate loop grep
loop 176987/s   -- -16%
grep 210436/s  19%   --
#### arraylen 10000 ###
       Rate loop grep
loop 1794/s   -- -18%
grep 2201/s  23%   --

So it appears that, at least on my machine with my Perl, grep is faster. However, as always with optimizations, you might want to consider whether this performance impact is significant in the context of your larger program, since for example trying to stuff a complicated condition into a grep can make your code less readable. And you should benchmark the code you are actually running instead of taking "grep is always faster than foreach+push" as a general rule!

Upvotes: 1

Related Questions