André Pontes
André Pontes

Reputation: 467

What is the fastest way to increment a string in perl?

I would like to append a string in perl within a loop in a fast way, without having to copy the string for each iteration. I'm looking for something like StringBuilder from Java or C#.

I currently know the following alternatives in mind, in order to do 'a += b'.

  1. a .= b # concat
  2. a = join('', a, b); # join
  3. push @a, b # array push

I am not interested in copying all string to the other. I need to copy one character per time, or append small strings foreach iteration. I am trying to solve the following problem: compress the input string 'aaabbccc' to '3a2b3c'. So the idea is to iterate over the input string, check how many repeated characters we have, and then append to the output in the compressed way. What is the most efficient to perform this in perl ?

Here is a link to the problem I was trying to solve. I's slightly different though.

Upvotes: 3

Views: 882

Answers (2)

Håkon Hægland
Håkon Hægland

Reputation: 40758

For comparsion, I tried to test different versions for solving your actual problem of compressing the string. Here is my test script test.pl:

use strict;
use warnings;

use Benchmark qw(cmpthese);
use Inline C => './compress_c.c';

my $str_len = 10000;
my @chars = qw(a b c d);
my $str;
$str .= [@chars]->[rand 4] for 1 .. $str_len;

cmpthese(
    -1,
    {
        compress_array => sub { compress_array( $str ) },
        compress_regex => sub { compress_regex( $str ) },
        compress_str   => sub { compress_str( $str ) },
        compress_c     => sub { compress_c( $str ) },
    }
);

# Suggested by @melpomene in the comments   
sub compress_regex {
    return $_[0] =~ s/([a-z])\1+/($+[0] - $-[0]) . $1/egr;
}

sub compress_array {
    my $result = '';

    my @chrs = split //, $_[0];

    my $prev = $chrs[0];
    my $count = 1;
    my @result;
    for my $i ( 1..$#chrs ) {
        my $char = $chrs[$i];
        if ( $prev eq $char ) {
            $count++;
            next if $i < $#chrs;
        }
        if ( $count > 1) {
            push @result, $count, $prev;
        }
        else {
            push @result, $prev;
        }
        if ( ( $i == $#chrs ) and ( $prev ne $char ) ) {
            push @result, $char;
            last;
        }
        $count = 1;
        $prev = $char;
    }

    return join '', @result;
}

sub compress_str {
    my $result = '';
    my $prev = substr $_[0], 0, 1;
    my $count = 1;
    my $lastind = (length $_[0]) - 1;
    for my $i (1 .. $lastind) {
        my $char = substr $_[0], $i, 1;
        if ( $prev eq $char ) {
            $count++;
            next if $i < $lastind;
        }

        if ( $count > 1) {
            $result .= $count;
        }
        $result .= $prev;
        if ( ( $i == $lastind ) and ( $prev ne $char ) ) {
            $result .= $char;
            last;
        }
        $count = 1;
        $prev = $char;
    }

    return $result;
}

where compress_c.c is:

SV *compress_c(SV* str_sv) {
    STRLEN len;
    char* str = SvPVbyte(str_sv, len);

    SV* result = newSV(len);
    char *buf = SvPVX(result);

    char prev = str[0];
    int count = 1;
    int j = 0;
    int i;
    for (i = 1; i < len; i++ )
    {
    char cur = str[i];
        if ( prev == cur ) {
            count++;
            if ( i < (len - 1) )
                continue;
        }

        if ( count > 1) {
            buf[j++] = count + '0';  // assume count is less than 10
        }

        buf[j++] = prev;
        if ( (i == (len - 1)) && (prev != cur) ) buf[j++] = cur;
        count = 1;
        prev = cur;
    }

    buf[j] = '\0';
    SvPOK_on(result);
    SvCUR_set(result, j);
    return result;
}

The result of running perl test.pl:

                  Rate compress_array  compress_str compress_regex    compress_c
compress_array   311/s             --          -42%           -45%          -99%
compress_str     533/s            71%            --            -6%          -98%
compress_regex   570/s            83%            7%             --          -98%
compress_c     30632/s          9746%         5644%          5273%            --

Which shows that regex version is slightly faster than the string version. However, the C version is the fastest, and it is about 50 times as fast as the regex version.

Note: I tested this on my Ubuntu 16.10 laptop (Intel Core i7-7500U CPU @ 2.70GHz)

Upvotes: 4

Andr&#233; Pontes
Andr&#233; Pontes

Reputation: 467

I've performed the following benchmark in several ways to perform that:

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(cmpthese);

my $dna;
$dna .= [qw(G A T C)]->[rand 4] for 1 .. 10000;

sub frequency_concat {
    my $result = '';

    for my $idx (0 .. length($dna) - 1) {
            $result .= substr($dna, $idx, 1);
    }

    return $result;
 }

 sub frequency_join {
    my $result = '';

    for my $idx (0 .. length($dna) - 1) {
            $result = join '', $result, substr($dna,$idx,1);
    }

    return $result;
}

sub frequency_list_push {
       my @result = ();

       for my $idx (0 .. length($dna) - 1) {
               push @result, substr($dna,$idx,1);
       }

       return join '', @result;
 }

 sub frequency_list_prealloc {
            my @result = (' ' x length($dna));

            for my $idx (0 .. length($dna) - 1) {
                    $result[$idx] = substr($dna,$idx,1);
            }

            return join '', @result;
 }


cmpthese(-1, # Run each for at least 1 second(s)   {
               concat => \&frequency_concat,
               join => \&frequency_join,
               list_push => \&frequency_list_push,
               list_list_prealloc => \&frequency_list_prealloc
       }
   );

The results below have shown that the concat (a . b) is the fastest operation. I don't understand why, since this will need to make several copies of the string.

                    Rate         join   list_push list_list_prealloc          concat
join               213/s           --        -38%               -41%        -74%
list_push          342/s          60%          --                -5%        -58%
list_list_prealloc 359/s          68%          5%                 --        -56%
concat             822/s         285%        140%               129%          --

Upvotes: 2

Related Questions