Reputation: 33
I ran across a piece of Perl code I wasn't sure how to interpret today. Specifically, the line $lookup -> {$chr} -> {$start} = $end
as I am not sure how multiple infix dereference operators work in series.
The input file contains tab-delimited chromosome names ($chr)
, start positions ($start)
, and end positions ($end)
on each line. I get that the author is creating a hash table where $chr
maps to arrays with $start
values corresponding to each chromosome, but I can't establish exactly what he is trying to accomplish with the next line. Any insight would be much appreciated.
my $hash;
my $lookup;
if (defined $bed_file) {
open(FILE, $bed_file);
while (my $line = <FILE>) {
chomp $line;
my ($chr, $start, $end) = split(/\t/, $line);
push(@{$hash -> {$chr}}, $start);
$lookup -> {$chr} -> {$start} = $end;
}
close(FILE);
}
Upvotes: 3
Views: 583
Reputation: 41872
$lookup -> {$chr} -> {$start} = $end
$lookup
is (being treated as) a pointer to a hash of hashes. $chr
is the first level key, the value is another hash pointer. $start
is the second level key, whose value is $end
.
This code is relying on autovivification. Although $lookup
is never initialized to anything, when working with pointers in Perl, if you pretend/believe that a structure exists, it exists. Ditto for the $hash
variable (a hash of arrays.)
Another Perl feature, not employed here, is arrow collapsing such that arrows between indexes (of either sort) are optional. So this code can also read:
$lookup->{$chr}{$start} = $end
possibly better revealing the two level hash structure.
$lookup
and $hash
at the top level are parallel hashes, in that their first level keys are the same. The $hash
structure appears to be an optimization as it could be computed from $lookup
:
keys(%{$lookup->{$chr}})
vs.
@{$hash->{$chr}}
the difference being that $hash
would preserve the file order of the $start
values and $lookup
would not.
Upvotes: 4
Reputation: 15603
By saying $lookup->{$chr}->{$start} = $end
(the second arrow is optional, you may also write $lookup->{$chr}{$start} = $end
), the $lookup
scalar is turned into a reference to a hash that has chromosome names as keys.
Each entry in the hash that $lookup
is a reference to is in turn a hash reference to a hash with the start position as key and the end position as value.
You may easily investigate the data structure after the loop by adding
use Data::Dumper;
print Dumper($lookup);
You will see something like
$VAR1 = {
'chr2' => {
'1234' => 5678
},
'chr1' => {
'1234' => 5678
}
};
This tells you that $lookup
is (loosely speaking) a "hash of hashes".
Upvotes: 1