stayingsong
stayingsong

Reputation: 33

How do multiple arrow operators in series work in Perl?

I ran across a piece of Perl code I wasn't sure how to interpret today. Specifically, the line $lookup -> {$chr} -> {$start} = $end as I am not sure how multiple infix dereference operators work in series.

The input file contains tab-delimited chromosome names ($chr), start positions ($start), and end positions ($end) on each line. I get that the author is creating a hash table where $chr maps to arrays with $start values corresponding to each chromosome, but I can't establish exactly what he is trying to accomplish with the next line. Any insight would be much appreciated.

my $hash;
my $lookup;
if (defined $bed_file) {
    open(FILE, $bed_file);
    while (my $line = <FILE>) {
        chomp $line;
        my ($chr, $start, $end) = split(/\t/, $line);
        push(@{$hash -> {$chr}}, $start);
        $lookup -> {$chr} -> {$start} = $end;
    }
    close(FILE);
}

Upvotes: 3

Views: 583

Answers (2)

cdlane
cdlane

Reputation: 41872

$lookup -> {$chr} -> {$start} = $end

$lookup is (being treated as) a pointer to a hash of hashes. $chr is the first level key, the value is another hash pointer. $start is the second level key, whose value is $end.

This code is relying on autovivification. Although $lookup is never initialized to anything, when working with pointers in Perl, if you pretend/believe that a structure exists, it exists. Ditto for the $hash variable (a hash of arrays.)

Another Perl feature, not employed here, is arrow collapsing such that arrows between indexes (of either sort) are optional. So this code can also read:

$lookup->{$chr}{$start} = $end

possibly better revealing the two level hash structure.

$lookup and $hash at the top level are parallel hashes, in that their first level keys are the same. The $hash structure appears to be an optimization as it could be computed from $lookup:

keys(%{$lookup->{$chr}})

vs.

@{$hash->{$chr}}

the difference being that $hash would preserve the file order of the $start values and $lookup would not.

Upvotes: 4

Kusalananda
Kusalananda

Reputation: 15603

By saying $lookup->{$chr}->{$start} = $end (the second arrow is optional, you may also write $lookup->{$chr}{$start} = $end), the $lookup scalar is turned into a reference to a hash that has chromosome names as keys.

Each entry in the hash that $lookup is a reference to is in turn a hash reference to a hash with the start position as key and the end position as value.

You may easily investigate the data structure after the loop by adding

use Data::Dumper;
print Dumper($lookup);

You will see something like

$VAR1 = {
          'chr2' => {
                      '1234' => 5678
                    },
          'chr1' => {
                      '1234' => 5678
                    }
        };

This tells you that $lookup is (loosely speaking) a "hash of hashes".

Upvotes: 1

Related Questions