fugu
fugu

Reputation: 6568

Sorting a hash of hashes by the inner hash's values

I have a hash of hashes

my %change;

while ( <DATA> ) {
    chomp;
    my ($gene, $condition, $change) = split;
    $change{$gene}{$condition} = $change;
}

print Dumper \%change;

__DATA__
gene1   condition1  10
gene2   condition1  0.5
gene3   condition1  1.5
gene1   condition2  2
gene2   condition2  13.5
gene3   condition2  0.25

And I want to sort it by value:

gene2   condition2  13.5
gene1   condition1  10
gene1   condition2  2
gene3   condition1  1.5
gene2   condition1  0.5
gene3   condition2  0.25

I'm using:

for my $g (keys %change){
    for my $con (keys $change{$g}){
        for my $ch (sort { $change{$g}{$a} <=> $change{$g}{$b} } keys $change{$g}{$con} ) {
            print "$g\t$con\t$ch\n";
        }

    }
}

But this doesn't work, and generates the error

Type of argument to keys on reference must be unblessed hashref or arrayref at untitled.pl line 23, line 6.

Line 23 is

for my $ch (sort { $change{$g}{$a} <=> $change{$g}{$b} } keys $change{$g}{$con}){

Can anyone point me in the right direction?

Upvotes: 0

Views: 375

Answers (4)

ikegami
ikegami

Reputation: 385655

You only have two hashes deep, so

  • %change is a hash.
  • $change{$g} is a reference to a hash
  • %{ $change{$g} } is a hash.
  • $change{$g}{$con} is a number.
  • %{ $change{$g}{$con} } is an error, as reported.

The fix is... Well, there is no fix. The approach you took can't be used to solve your problem.


You can't sort a hash. You can sort the keys of a hash, but that's not what you want to here. You'll want to sort key pairs. So first, you're going to have to create those key pairs.

map {
   my $outer_key = $_;
   map {
      my $inner_key = $_;
      [ $outer_key, $inner_key ]
   } keys %{ $change{$_} }
} keys(%change)

This creates

[
   [ 'gene1', 'condition1' ],
   [ 'gene1', 'condition2' ],
   [ 'gene2', 'condition1' ],
   [ 'gene2', 'condition2' ],
   [ 'gene3', 'condition1' ],
   [ 'gene3', 'condition2' ],
]

When we sort them

sort { $change{ $a->[0] }{ $a->[1] } <=> $change{ $b->[0] }{ $b->[1] }

All together:

for (
   sort { $change{ $a->[0] }{ $a->[1] } <=> $change{ $b->[0] }{ $b->[1] }
   map {
      my $gene = $_;
      map {
         my $con = $_;
         [ $gene, $con ]
      } keys %{ $change{$_} }
   } keys(%change)
) {
   my ($gene, $con) = @$_;
   print("$g\t$con\t$change{$gene}{$con}\n");
}

But what if we created the following flattened structure instead?

[
   [ 'gene1', 'condition1', 10    ],
   [ 'gene1', 'condition2',  2    ],
   [ 'gene2', 'condition1',  0.5  ],
   [ 'gene2', 'condition2', 13.5  ],
   [ 'gene3', 'condition1',  1.5  ],
   [ 'gene3', 'condition2',  0.25 ],
]

This would allow us to simplify some.

for (
   sort { $a->[2] <=> $b->[2] }
   map {
      my $gene = $_;
      map {
         my $con = $_;
         [ $gene, $con, $change{$gene}{$con} ]
      } keys %{ $change{$_} }
   } keys(%change)
) {
   my ($gene, $con, $ch) = @$_;
   print("$g\t$con\t$ch\n");
}

Upvotes: 3

TLP
TLP

Reputation: 67900

As I mentioned in the comments, the simplest solution is to not first parse the text input into a hash, then sort the hash, but rather collect the data into a more suitable form and sort it there.

Also, note that you cannot do your sorting while iterating over the values. You need to compile a list, and sort that list all at once, since sort is a sort of iterator itself.

I have shown first my choice of method for the input given, then how to sort the hash.

use strict;
use warnings;

my %change;
my @sort;
while(<DATA>) {
    chomp;
    my ($gene, $condition, $change) = split;
    $change{$gene}{$condition} = $change;
    push @sort, [ $change, $_ ];
}

@sort = sort { $a->[0] <=> $b->[0] } @sort;
say $_->[1] for @sort;

# Using the hash:

my @values;
for my $gene (keys %change) {
    for my $con (keys %{ $change{$gene} }) {
        my $num = $change{$gene}{$con};
        push @values, [ $num, "$gene\t$con\t$num" ];
    }
}
@values = sort { $a->[0] <=> $b->[0] } @values;
say $_->[1] for @values;

__DATA__
gene1   condition1  10
gene2   condition1  0.5
gene3   condition1  1.5
gene1   condition2  2
gene2   condition2  13.5
gene3   condition2  0.25

As you can see, I am using a sort of cache to access the value more easily. For example push @sort, [ $change, $_ ] stores an array ref with the numeric value, along with the original string from the input. These values can then be accessed with $a->[0] when sorting, and $_->[1] when printing.

I find this method to be simple and robust. Though if your input file is very large, it may cause some memory issues due to the duplication of data. But anything smaller than gigabytes should be fine on a modern system.

Upvotes: 4

mpapec
mpapec

Reputation: 50637

You can flatten your hash structure, and then sort numerically by value (last element in array of arrays)

my $VAR1 = {
      'gene1' => {
                   'condition1' => '10',
                   'condition2' => '2'
                 },
      'gene2' => {
                   'condition1' => '0.5',
                   'condition2' => '13.5'
                 },
      'gene3' => {
                   'condition1' => '1.5',
                   'condition2' => '0.25'
                 }
    };

my @sorted = sort {
    $b->[2] <=> $a->[2]
  }
  map {
    my $k = $_;
    my $h = $VAR1->{$k};
    map [ $k, $_, $h->{$_} ], keys %$h;
  }
  keys %$VAR1;

print "@$_\n" for @sorted;

output

gene2 condition2 13.5
gene1 condition1 10
gene1 condition2 2
gene3 condition1 1.5
gene2 condition1 0.5
gene3 condition2 0.25

using foreach instead of map,

my @arr;
for my $k (keys %$VAR1) {
  my $h = $VAR1->{$k};
  for (keys %$h) {
    push @arr, [ $k, $_, $h->{$_} ];
  }
}
my @sorted = sort { $b->[2] <=> $a->[2] } @arr;

Upvotes: 3

Borodin
Borodin

Reputation: 126722

I think it's very unlikely that you need the data in a hash structure like that. Certainly for the purposes of this task you would be better off with an array of arrays

use strict;
use warnings;

my @change;

while ( <DATA> ) {
    push @change, [ split ];
}

print "@$_\n" for sort { $b->[2] <=> $a->[2] } @change;


__DATA__
gene1   condition1  10
gene2   condition1  0.5
gene3   condition1  1.5
gene1   condition2  2
gene2   condition2  13.5
gene3   condition2  0.25

output

gene2 condition2 13.5
gene1 condition1 10
gene1 condition2 2
gene3 condition1 1.5
gene2 condition1 0.5
gene3 condition2 0.25

If you explain what sort of access you need to the data then I am sure there is something better. For instance, I would suggest %gene and %condition hashes that mapped a gene or condition ID to a list of the array elements that used that gene. Then you could access the data when you know either the gene or the condition

Upvotes: 5

Related Questions