Reputation: 2498

What's an optimal way to use a hash reference?

I'm working on a script that generates multiple large hash of arrays (HoAs) data structures. I'm trying to optimize my script as currently it's taking considerable amount of time to run.

I've done a bit of benchmarking. I've managed to make the script execution approx. 3.5 times faster by utilizing array references and reducing subroutine call overhead by using @_ directly instead of copying it into a variable. I've also removed unnecessary subroutines and redundant variable declarations. Despite these improvements, I'd like to make the code run even faster.

At the start of my script I parse through a large file to generate two HoA data structures. Which one of these approaches in regards to hash references is the most feasible and efficient? The HoA will look something like this:

%HoA = (
    'C1' =>  ['1', '3', '3', '3'],
    'C2' => ['3','2'],
    'C3' => ['1','3','3','4','5','5'],
    'C4'  => ['3','3','4'],
    'C5' => ['1'],
);

OPTION 1

Generate the HoAs as I parse the file (See below). Finally put the hash of arrays into a hash ref.

my $hash_ref = \%HoA;

OPTION 2

Parse the file such that each key in the HoA has a value that points to an array_ref. Finally put the hash of arrays into a hash ref.

==============

I feel like OPTION 2 is a good approach but how do I do this?

Here's how I'm currently doing it.

use strict; use warnings;
open(F1, "file.txt") or die $!;
my %HoA = ();
    while (<F1>){
    $_=~ s/\r//;
    chomp;
    my @cols = split(/\t/, $_);

    push( @{$HoA{$cols[0]}}, @cols[1..$#cols]);
 }
close F1;

I need an efficient data structure will facilitate me to lookup values and keys fast. Also, I need to be able to pass in the the key values (the arrays), the keys, and the HoA itself into subroutines as efficiently as possible multiple times.

Upvotes: 3

Answers (4)

ikegami

Reputation: 385915

Don't use global variables, including for file handles.
You declared %HoA and never used.
You declared $HoA_ref and never used it.
You used $HoA without declaring it. Always use use strict; use warnings;
Why create a reference you don't need and end up dereferencing it numerous times?
There's no reason to assign an empty list to a hash you just created. my %HoA = (); is silly.
Might as well combine the s/// and the chomp;
Either omit $_ when not needed, or use a meaningful variable name.

All of the above and a few other improvements have been made to get:

use strict;
use warnings;

open(my $fh, '<', 'file.txt') or die $!;

my %HoA;
while (<$fh>){
    s/\r?\n\z//;
    my ($key, @cols) = split /\t/;
    push @{ $HoA{$key} }, @cols;
}

Upvotes: 4

user966588

Reputation:

As you are having large file, rather than using while loop, I would suggest doing a complete file slurping using module File::Slurp.

File::Slurp read_file function tries to bypass perl I/O using sysread (check read_file source code) call.

my $text = read_file( $file ) ;

Upvotes: 1

Gene

Reputation: 46960

My experience is that it's best to use references wherever possible. Some additional notes:

If you need this, $_=~ s/\r//; for Windows eol compatibility, then you need a better perl build. ActiveState's is usually the most robust. chomp ought to take care of a terminal cr/lf, or rather the file read ought to have already converted the cr/lf pair to a lf only.
Perl shift is O(1) and very fast. You can use this to your advantage here.
You can't tell in advance what will be fastest. Benchmarking of options is the only way to go.
Try reading the input file alone with no processing. Once the job is I/O bound, optimizing doesn't help any more.

Here is what I'd start with:

 open(F, "file.txt") or die $!;
 my $h = {};
 while (<F>){
   chomp;
   my @cols = split "\t";
   my $key = shift @cols;
   push @{$h->{$key}}, @cols;
 }
 close F;

Upvotes: 2

chrsblck

Reputation: 4088

I think this is what you're attempting to do in your example.

open(my $fh, "<", "file.txt") or die $!;
my $HoA_ref = {}; # ref will return a HASH 
while (my $line = <$fh>) {
    $line =~ s/\r//;
    chomp $line;
    my @cols = split(/\t/, $line);

    # shift off first element in the list to use 
    # as the key
    my $key = shift(@cols);
    # set value to an array ref of whatever 
    # is left in the list.
    $HoA_ref->{$key} => [@cols];
}
close <$fh>;

It's worth noting that $key will get overwritten if it appears more than once as you're looping through the file.

Upvotes: 1

What&#39;s an optimal way to use a hash reference?

Answers (4)

Related Questions

What's an optimal way to use a hash reference?