dragon951
dragon951

Reputation: 396

Interpolating a non-interpolated passed string inside a subroutine in Perl

I am looking to parse a tab delimited text file into a nested hash with a subroutine. Each file row will be keyed by a unique id from a uid column(s), with the header row as nested keys. Which column(s) is(are) to become the uid changes (as sometimes there isn't a unique column, so the uid has to be a combination of columns). My issue is with the $uid variable, which I pass as a non-interpolated string. When I try to use it inside the subroutine in an interpolated way, it will only give me the non-interpolated value:

    use strict;
    use warnings;

    my $lofrow = tablehash($lof_file, '$row{gene}', "transcript", "ENST");

    ##sub to generate table hash from file w/ headers
    ##input values are file, uid, header starter, row starter, max column number
    ##returns hash reference (deref it)
    sub tablehash   { 
        my ($file, $uid, $headstart, $rowstart, $colnum) = @_;
        if (!$colnum){ # takes care of a unknown number of columns
            $colnum = 0;
        }
        open(INA, $file) or die "failed to open $file, $!\n";
        my %table; # permanent hash table 
        my %row; # hash of column values for each row
        my @names = (); # column headers
        my @values = (); # line/row values
        while (chomp(my $line = <INA>)){ # reading lines for lof info
            if ($line =~ /^$headstart/){
                @names = split(/\t/, $line, $colnum);
            } elsif ($line =~ /^$rowstart/){ # splitting lof info columns into variables
                @values = split(/\t/, $line, $colnum);
                @row{@names} = @values;
                print qq($uid\t$row{gene}\n); # problem: prints "$row{gene} ACB1"
                $table{"$uid"} = { %row }; # puts row hash into permanent hash, but with $row{gene} key)
            }
        }
        close INA;
        return \%table;
    }

I am out of ideas. I could put $table{$row{$uid}} and simply pass "gene", but in a couple of instances I want to have a $uid of "$row{gene}|$row{rsid}" producing $table{ACB1|123456}

Upvotes: 4

Views: 468

Answers (1)

melpomene
melpomene

Reputation: 85767

Interpolation is a feature of the Perl parser. When you write something like

"foo $bar baz"

, Perl compiles it into something like

'foo ' . $bar . ' $baz'

It does not interpret data at runtime.

What you have is a string where one of the characters happens to be $ but that has no special effect.


There are at least two possible ways to do something like what you want. One of them is to use a function, not a string. (Which makes sense because interpolation really means concatenation at runtime, and the way to pass code around is to wrap it in a function.)

my $lofrow = tablehash($lof_file, sub { my ($row) = @_; $row->{gene} }, "transcript", "ENST");

sub tablehash   { 
    my ($file, $mkuid, $headstart, $rowstart, $colnum) = @_;    
    ...
                my $uid = $mkuid->(\%row);
                $table{$uid} = { %row };

Here $mkuid isn't a string but a reference to a function that (given a hash reference) returns a uid string. tablehash calls it, passing a reference to %row to it. You can then later change it to e.g.

my $lofrow = tablehash($lof_file, sub { my ($row) = @_; "$row->{gene}|$row->{rsid}" }, "transcript", "ENST");

Another solution is to use what amounts to a template string:

my $lofrow = tablehash($lof_file, "gene|rsid", "transcript", "ENST");

sub tablehash   { 
    my ($file, $uid_template, $headstart, $rowstart, $colnum) = @_;    
    ...
                (my $uid = $uid_template) =~ s/(\w+)/$row{$1}/g;
                $table{$uid} = { %row };

The s/// code goes through the template string and manually replaces every word by the corresponding value from %row.


Random notes:

  • Bonus points for using strict and warnings.
  • if (!$colnum) { $colnum = 0; } can be simplified to $colnum ||= 0;.
  • Use lexical variables instead of bareword filehandles. Barewords are effectively global variables (and syntactically awkward because they're not first-class citizens of the language).
  • Always use the 3-argument form of open to avoid unexpected interpretation of the second argument.
  • Include the name of your program in error messages (either explicitly with $0 or implicitly by omitting \n from die).
  • my @foo = (); my %bar = (); is redundant and can be simplified to my @foo; my %bar;. Arrays and hashes start out empty; overwriting them with an empty list is pointless.
  • chomp(my $line = <INA>) will throw a warning when you reach EOF (because you're trying to chomp a variable containing undef).
  • my %row; should probably be declared inside the loop. It looks like it's supposed to only contain values from the current line.

Suggestion:

open my $fh, '<', $file or die "$0: can't open $file: $!\n";
while (my $line = readline $fh) {
    chomp $line;
    ...
}

Upvotes: 3

Related Questions