Reputation: 396
I am looking to parse a tab delimited text file into a nested hash with a subroutine. Each file row will be keyed by a unique id from a uid column(s), with the header row as nested keys. Which column(s) is(are) to become the uid changes (as sometimes there isn't a unique column, so the uid has to be a combination of columns). My issue is with the $uid
variable, which I pass as a non-interpolated string. When I try to use it inside the subroutine in an interpolated way, it will only give me the non-interpolated value:
use strict;
use warnings;
my $lofrow = tablehash($lof_file, '$row{gene}', "transcript", "ENST");
##sub to generate table hash from file w/ headers
##input values are file, uid, header starter, row starter, max column number
##returns hash reference (deref it)
sub tablehash {
my ($file, $uid, $headstart, $rowstart, $colnum) = @_;
if (!$colnum){ # takes care of a unknown number of columns
$colnum = 0;
}
open(INA, $file) or die "failed to open $file, $!\n";
my %table; # permanent hash table
my %row; # hash of column values for each row
my @names = (); # column headers
my @values = (); # line/row values
while (chomp(my $line = <INA>)){ # reading lines for lof info
if ($line =~ /^$headstart/){
@names = split(/\t/, $line, $colnum);
} elsif ($line =~ /^$rowstart/){ # splitting lof info columns into variables
@values = split(/\t/, $line, $colnum);
@row{@names} = @values;
print qq($uid\t$row{gene}\n); # problem: prints "$row{gene} ACB1"
$table{"$uid"} = { %row }; # puts row hash into permanent hash, but with $row{gene} key)
}
}
close INA;
return \%table;
}
I am out of ideas. I could put $table{$row{$uid}}
and simply pass "gene"
, but in a couple of instances I want to have a $uid
of "$row{gene}|$row{rsid}"
producing $table{ACB1|123456}
Upvotes: 4
Views: 468
Reputation: 85767
Interpolation is a feature of the Perl parser. When you write something like
"foo $bar baz"
, Perl compiles it into something like
'foo ' . $bar . ' $baz'
It does not interpret data at runtime.
What you have is a string where one of the characters happens to be $
but that has no special effect.
There are at least two possible ways to do something like what you want. One of them is to use a function, not a string. (Which makes sense because interpolation really means concatenation at runtime, and the way to pass code around is to wrap it in a function.)
my $lofrow = tablehash($lof_file, sub { my ($row) = @_; $row->{gene} }, "transcript", "ENST");
sub tablehash {
my ($file, $mkuid, $headstart, $rowstart, $colnum) = @_;
...
my $uid = $mkuid->(\%row);
$table{$uid} = { %row };
Here $mkuid
isn't a string but a reference to a function that (given a hash reference) returns a uid string. tablehash
calls it, passing a reference to %row
to it. You can then later change it to e.g.
my $lofrow = tablehash($lof_file, sub { my ($row) = @_; "$row->{gene}|$row->{rsid}" }, "transcript", "ENST");
Another solution is to use what amounts to a template string:
my $lofrow = tablehash($lof_file, "gene|rsid", "transcript", "ENST");
sub tablehash {
my ($file, $uid_template, $headstart, $rowstart, $colnum) = @_;
...
(my $uid = $uid_template) =~ s/(\w+)/$row{$1}/g;
$table{$uid} = { %row };
The s///
code goes through the template string and manually replaces every word by the corresponding value from %row
.
Random notes:
strict
and warnings
.if (!$colnum) { $colnum = 0; }
can be simplified to $colnum ||= 0;
.$0
or implicitly by omitting \n
from die
).my @foo = (); my %bar = ();
is redundant and can be simplified to my @foo; my %bar;
. Arrays and hashes start out empty; overwriting them with an empty list is pointless.chomp(my $line = <INA>)
will throw a warning when you reach EOF (because you're trying to chomp a variable containing undef
).my %row;
should probably be declared inside the loop. It looks like it's supposed to only contain values from the current line.Suggestion:
open my $fh, '<', $file or die "$0: can't open $file: $!\n";
while (my $line = readline $fh) {
chomp $line;
...
}
Upvotes: 3