Reputation: 6568
I have a hash of hashes
my %change;
while ( <DATA> ) {
chomp;
my ($gene, $condition, $change) = split;
$change{$gene}{$condition} = $change;
}
print Dumper \%change;
__DATA__
gene1 condition1 10
gene2 condition1 0.5
gene3 condition1 1.5
gene1 condition2 2
gene2 condition2 13.5
gene3 condition2 0.25
And I want to sort it by value:
gene2 condition2 13.5
gene1 condition1 10
gene1 condition2 2
gene3 condition1 1.5
gene2 condition1 0.5
gene3 condition2 0.25
I'm using:
for my $g (keys %change){
for my $con (keys $change{$g}){
for my $ch (sort { $change{$g}{$a} <=> $change{$g}{$b} } keys $change{$g}{$con} ) {
print "$g\t$con\t$ch\n";
}
}
}
But this doesn't work, and generates the error
Type of argument to keys on reference must be unblessed hashref or arrayref at untitled.pl line 23, line 6.
Line 23 is
for my $ch (sort { $change{$g}{$a} <=> $change{$g}{$b} } keys $change{$g}{$con}){
Can anyone point me in the right direction?
Upvotes: 0
Views: 375
Reputation: 385655
You only have two hashes deep, so
%change
is a hash.$change{$g}
is a reference to a hash%{ $change{$g} }
is a hash.$change{$g}{$con}
is a number.%{ $change{$g}{$con} }
is an error, as reported.The fix is... Well, there is no fix. The approach you took can't be used to solve your problem.
You can't sort a hash. You can sort the keys of a hash, but that's not what you want to here. You'll want to sort key pairs. So first, you're going to have to create those key pairs.
map {
my $outer_key = $_;
map {
my $inner_key = $_;
[ $outer_key, $inner_key ]
} keys %{ $change{$_} }
} keys(%change)
This creates
[
[ 'gene1', 'condition1' ],
[ 'gene1', 'condition2' ],
[ 'gene2', 'condition1' ],
[ 'gene2', 'condition2' ],
[ 'gene3', 'condition1' ],
[ 'gene3', 'condition2' ],
]
When we sort them
sort { $change{ $a->[0] }{ $a->[1] } <=> $change{ $b->[0] }{ $b->[1] }
All together:
for (
sort { $change{ $a->[0] }{ $a->[1] } <=> $change{ $b->[0] }{ $b->[1] }
map {
my $gene = $_;
map {
my $con = $_;
[ $gene, $con ]
} keys %{ $change{$_} }
} keys(%change)
) {
my ($gene, $con) = @$_;
print("$g\t$con\t$change{$gene}{$con}\n");
}
But what if we created the following flattened structure instead?
[
[ 'gene1', 'condition1', 10 ],
[ 'gene1', 'condition2', 2 ],
[ 'gene2', 'condition1', 0.5 ],
[ 'gene2', 'condition2', 13.5 ],
[ 'gene3', 'condition1', 1.5 ],
[ 'gene3', 'condition2', 0.25 ],
]
This would allow us to simplify some.
for (
sort { $a->[2] <=> $b->[2] }
map {
my $gene = $_;
map {
my $con = $_;
[ $gene, $con, $change{$gene}{$con} ]
} keys %{ $change{$_} }
} keys(%change)
) {
my ($gene, $con, $ch) = @$_;
print("$g\t$con\t$ch\n");
}
Upvotes: 3
Reputation: 67900
As I mentioned in the comments, the simplest solution is to not first parse the text input into a hash, then sort the hash, but rather collect the data into a more suitable form and sort it there.
Also, note that you cannot do your sorting while iterating over the values. You need to compile a list, and sort that list all at once, since sort
is a sort of iterator itself.
I have shown first my choice of method for the input given, then how to sort the hash.
use strict;
use warnings;
my %change;
my @sort;
while(<DATA>) {
chomp;
my ($gene, $condition, $change) = split;
$change{$gene}{$condition} = $change;
push @sort, [ $change, $_ ];
}
@sort = sort { $a->[0] <=> $b->[0] } @sort;
say $_->[1] for @sort;
# Using the hash:
my @values;
for my $gene (keys %change) {
for my $con (keys %{ $change{$gene} }) {
my $num = $change{$gene}{$con};
push @values, [ $num, "$gene\t$con\t$num" ];
}
}
@values = sort { $a->[0] <=> $b->[0] } @values;
say $_->[1] for @values;
__DATA__
gene1 condition1 10
gene2 condition1 0.5
gene3 condition1 1.5
gene1 condition2 2
gene2 condition2 13.5
gene3 condition2 0.25
As you can see, I am using a sort of cache to access the value more easily. For example push @sort, [ $change, $_ ]
stores an array ref with the numeric value, along with the original string from the input. These values can then be accessed with $a->[0]
when sorting, and $_->[1]
when printing.
I find this method to be simple and robust. Though if your input file is very large, it may cause some memory issues due to the duplication of data. But anything smaller than gigabytes should be fine on a modern system.
Upvotes: 4
Reputation: 50637
You can flatten your hash structure, and then sort numerically by value (last element in array of arrays)
my $VAR1 = {
'gene1' => {
'condition1' => '10',
'condition2' => '2'
},
'gene2' => {
'condition1' => '0.5',
'condition2' => '13.5'
},
'gene3' => {
'condition1' => '1.5',
'condition2' => '0.25'
}
};
my @sorted = sort {
$b->[2] <=> $a->[2]
}
map {
my $k = $_;
my $h = $VAR1->{$k};
map [ $k, $_, $h->{$_} ], keys %$h;
}
keys %$VAR1;
print "@$_\n" for @sorted;
output
gene2 condition2 13.5
gene1 condition1 10
gene1 condition2 2
gene3 condition1 1.5
gene2 condition1 0.5
gene3 condition2 0.25
using foreach
instead of map
,
my @arr;
for my $k (keys %$VAR1) {
my $h = $VAR1->{$k};
for (keys %$h) {
push @arr, [ $k, $_, $h->{$_} ];
}
}
my @sorted = sort { $b->[2] <=> $a->[2] } @arr;
Upvotes: 3
Reputation: 126722
I think it's very unlikely that you need the data in a hash structure like that. Certainly for the purposes of this task you would be better off with an array of arrays
use strict;
use warnings;
my @change;
while ( <DATA> ) {
push @change, [ split ];
}
print "@$_\n" for sort { $b->[2] <=> $a->[2] } @change;
__DATA__
gene1 condition1 10
gene2 condition1 0.5
gene3 condition1 1.5
gene1 condition2 2
gene2 condition2 13.5
gene3 condition2 0.25
gene2 condition2 13.5
gene1 condition1 10
gene1 condition2 2
gene3 condition1 1.5
gene2 condition1 0.5
gene3 condition2 0.25
If you explain what sort of access you need to the data then I am sure there is something better. For instance, I would suggest %gene
and %condition
hashes that mapped a gene or condition ID to a list of the array elements that used that gene. Then you could access the data when you know either the gene or the condition
Upvotes: 5