user1249760
user1249760

Reputation: 153

error pushing value into hash of array

I am parsing an output report from psiblast. I used COG alignments and searched a gene database for matches (homologues). One thing that I want to do is to find out which genes match to more than one COG. My partial script is below.

I am specifically having problems creating an array that holds all of the COGs for the genes that are assigned to multiple COGs.

I am getting the following error "Can't use string ("COG0003") as an ARRAY ref while "strict refs" in use at parse_POG_reports.pl line 26, line 67.".

I have looked at other posted relating to pushing elements into hashes of arrays. But I think the error might be occurring when one gene has 2 matches to the same COG, and it tries to push the same COG into the array (ie. the last 2 lines of the sample input). Does this make sense? If so, how can I avoid this problem?

use strict;
use warnings;

my %maxBits;my %COGhit_count;
my $Hohits={};my %COGhits;

my $COG_psi_report=$ARGV[0];
open (IN, $COG_psi_report) or die "cannot open $COG_psi_report\n";
while (my $line=<IN>){
    next if ($line =~/^#/);
    chomp $line;
    my @columns = split(/\t/,$line);
    my $bits=$columns[11];
    my $COG=$columns[0];
    my $hit=$columns[1];
    my $Eval=$columns[10];
    next if ($Eval > 0.00001); # threshold for significant hits set by DK
    $COGhit_count{$hit}++; # count how many COGs each gene is homologous to
    $COGhits{$hit}=$COG;
    if ($COGhit_count{$hit}>1) {
            push @{$COGhits{$hit}}, $COG; #
    }
    ## for those that there are multiple hits we need to select top hit ##
    if (!exists $maxBits{$hit}){
            $maxBits{$hit}=$bits;
    }
    elsif (exists $maxBits{$hit} && $bits > $maxBits{$hit}){
            $maxBits{$hit}=$bits;
    }
    $Hohits->{$hit}->{$bits}=$COG;
}
close (IN);

example Input:

POG0002 764184357-stool1_revised_scaffold22981_1_gene47608      23.90   159     112     3       1       156     1       153     2e-06   54.2
POG0002 764062976-stool2_revised_C999233_1_gene54902    23.63   182     121     5       3       169     2       180     2e-06   53.9
POG0002 763901136-stool1_revised_scaffold39447_1_gene145241     26.45   155     89      3       3       137     5       154     3e-06   53.9
POG0002 765701615-stool1_revised_C1349270_1_gene168522  23.53   187     115     5       3       169     2       180     5e-06   53.1
POG0002 158802708-stool2_revised_C1077267_1_gene26470   22.69   216     158     5       3       213     5       216     5e-06   52.7
POG0003 160502038-stool1_revised_scaffold47906_2_gene161164     33.00   297     154     6       169     424     334     626     6e-40    157
POG0003 160502038-stool1_revised_scaffold47906_2_gene161164     16.28   172     128     4       23      192     46      203     1e-06   56.6
POG0003 158337416-stool1_revised_C1254444_1_gene13533   30.06   346     184     7       133     424     57      398     6e-40    155
POG0003 158337416-stool1_revised_scaffold29713_1_gene153054     28.61   332     194     8       132     424     272     599     2e-38    152
POG0003 158337416-stool1_revised_scaffold29713_1_gene153054     24.00   200     131     5       1       193     5       190     9e-11   69.3

Upvotes: 2

Views: 589

Answers (2)

Axeman
Axeman

Reputation: 29854

It's telling you exactly what you're doing wrong.

$COGhits{$hit}=$COG;  # <--- scalar
if ($COGhit_count{$hit}>1) {
        push @{$COGhits{$hit}}, $COG; # <--- array
}

You can't assign the value as a non-reference type and then try to autovivify it as a reference type. Perl will do the latter, but not if you've already stored a conflicting data type in that location.

Also, if this by some miracle worked the first time (it won't), and you ran this more than once, any array that you might have autoviv-ed by the push, would be clobbered by the scalar non-reference assignment.

I'm not sure what you're after, but the first line should probably be deleted.


Instead of that construct, you want to decide whether there will ever be more than one specification of $COG for a value of $hit. If there can be, simply replacing those 4 lines with the push is the way to go.

I have done multi-purpose structure slots before, and they are largely a pain to maintain. But if you wanted to do something like that, you can do this:

my $ref = \$hashref->{ $key }; # autovivifies slot as simple scalar.
                               # it starts out as undefined.
if ( ref $$ref ) {             # ref $ref will always be true
    push @$$ref, $value;
}
else { 
    $$ref = defined( $$ref ) ? [ $$ref, $value ] : $value;
}

But you have to write bifurcated logic every time you want to access the mixed tree in some different way. The performance savings that you get with scalars, is somewhat eaten up by the tests and branching.

So I don't do too much of this anymore. I decide beforehand whether the relationship is 1-1 or 1-n. Routines like the ones below can make it more straightforward dealing with these kinds of tables, to a degree.

sub get_list_from_hash { 
    my ( $hash, $key ) = @_;
    my $ref = \$hash->{ $key };
    return unless defined( $$ref );
    return ref( $$ref ) ? @$$ref : $$ref;
}

sub store_in_hash { 
    $_[0] = {} unless ref $_[0];
    my ( $hash, $key, @values ) = @_;
    my @defined = grep {; defined } @values;
    unless ( @defined ) { 
        delete $hash->{ $key };
        return;
    }

    my $ref = \$hash->{ $key };
    if ( ref $$ref ) { 
        push @$$ref, @defined;
    }
    elsif ( defined $$ref ) { 
        $$ref = [ $$ref, @defined ];
    }
    elsif ( @values > 1 ) { 
        @$$ref = @defined;
    }
    else { 
        ( $$ref ) = @defined;
    }
}

Upvotes: 0

simbabque
simbabque

Reputation: 54323

You need to get rid of line 24 (counting backwards):

$COGhits{$hit}=$COG;

In it, you are setting $COGhits{$hit} to a scalar value (the value of $COG). Later, in line 26 you are trying to dereference $COGhits{$hit} as an array to push into it. That doesn't work because there's a scalar in there.

Just remove the if and change those lines into this. That should do the trick as now all those $hits are stored in array references.

$COGhit_count{$hit}++; # count how many COGs each gene is homologous to
push @{$COGhits{$hit}}, $COG;

Output of $COGhits:

$VAR4 = {
      '158802708-stool2_revised_C1077267_1_gene26470' => [
                                                           'POG0002'
                                                         ],
      '764062976-stool2_revised_C999233_1_gene54902' => [
                                                          'POG0002'
                                                        ],
      '764184357-stool1_revised_scaffold22981_1_gene47608' => [
                                                                'POG0002'
                                                              ],
      '765701615-stool1_revised_C1349270_1_gene168522' => [
                                                            'POG0002'
                                                          ],
      '763901136-stool1_revised_scaffold39447_1_gene145241' => [
                                                                 'POG0002'
                                                               ],
      '160502038-stool1_revised_scaffold47906_2_gene161164' => [
                                                                 'POG0003',
                                                                 'POG0003'
                                                               ]
    };

If you however want both the scalar and the array ref, try this code. I don't recommend this, though.

$COGhit_count{$hit}++; # count how many COGs each gene is homologous to
if ($COGhit_count{$hit} == 1) {
  $COGhits{$hit}=$COG;             # Save as scalar
}
elsif ($COGhit_count{$hit} == 2) { # If we've just found the second hit,
  my $temp = $COGhits{$hit};       # save the first and convert $COGhits{$hit}
  $COGhits{$hit} = [];             # to an array ref, then push both the old and
  push @{$COGhits{$hit}}, $temp, $COG; # the new value in it.
} elsif ($COGhit_count{$hit} > 2) {
  push @{$COGhits{$hit}}, $COG;    # Just push the new value in
}

Thought: You probably had $COGhits{$hit}=$COG first but then noticed that sometimes there can be more than one value, so you added the push line, but you did not realized that you in fact had to replace the old line.

Upvotes: 1

Related Questions