Reputation: 733
Having a hard time wrapping my head around this, and I'm sure it's just a case of be being dense today. I have a data structure similar to this:
{
PDGFRA => [
{ "p.N659K" => 22 },
{ "p.D842Y" => 11 },
{ "p.I843_S847>T" => 9 },
{ "p.D842_H845del" => 35 },
{ "p.I843_D846delIMHD" => 24 },
{ "p.I843_D846del" => 21 },
{ "p.D842V" => 457 },
{ "p.N659Y" => 5 },
{ "p.M844_S847del" => 7 },
{ "p.S566_E571>K" => 8 },
{ "p.S566_E571>R" => 50 },
{ "p.V561D" => 54 },
],
}
I would like to print out the results, reverse sorted (greatest to smallest) by the hash values, so ultimately I end up with something like this
PDGFRA p.D842V 457
PDGFRA p.V561D 54
PDGFRA p.S566_E571>R 50
PDGFRA p.D842_H845del 35
.
.
.
etc.
I have no problem printing the data structure as I want, but I can't seem to figure out how to sort the data prior to printing it out. I've tried to sort like this:
for my $gene ( sort keys %matches ) {
for my $var ( sort { $a->{$var} <=> $b->{$var} } @{$matches{$gene}} {
print "$var\n";
}
}
But whether I use $var
or $_
it doesn't seem to work, complaining that '$var' is not defined, and '$_' is not initialized. I also tried (really pathetically!) to use a Schwarzian transform, but I don't think I'm even close on this one:
for my $gene ( sort keys %matches ) {
my @sorted =
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, $matches{$gene}->{$_} ] } @{$matches{$gene}};
}
Would someone mind pointing me in the right direction for sorting a hash or arrays of hashes by internal hash value?
Upvotes: 2
Views: 309
Reputation: 42411
Your data structure appears to have unnecessary depth. But here's a Schwartzian Transform version that will sort it as you want.
for my $gene (sort keys %matches) {
my @sorted = map { {@$_} } # Back to hash.
sort { $b->[1] <=> $a->[1] } # Sort.
map { [each %$_] } # Unpack the 1-key hash.
@{$matches{$gene}};
}
Upvotes: 0
Reputation: 126722
Here's an alternative. This one works by sorting a list of indices into the array @sorted
so that you can then just iterate over an array slice @$list[@sorted]
.
You're hampered by the data format though, and it's messy just to extract a pair of values given a single-element hash.
I hope this helps.
use strict;
use warnings;
my %matches = (
PDGFRA => [
{ "p.N659K" => 22 },
{ "p.D842Y" => 11 },
{ "p.I843_S847>T" => 9 },
{ "p.D842_H845del" => 35 },
{ "p.I843_D846delIMHD" => 24 },
{ "p.I843_D846del" => 21 },
{ "p.D842V" => 457 },
{ "p.N659Y" => 5 },
{ "p.M844_S847del" => 7 },
{ "p.S566_E571>K" => 8 },
{ "p.S566_E571>R" => 50 },
{ "p.V561D" => 54 },
],
);
while (my ($key, $list) = each %matches) {
my @sorted = sort {
my ($ka, $va) = %{ $list->[$b] };
my ($kb, $vb) = %{ $list->[$a] };
$va <=> $vb;
} 0 .. $#$list;
for my $item (@$list[@sorted]) {
printf "%s %s %d\n", $key, %$item;
}
}
output
PDGFRA p.D842V 457
PDGFRA p.V561D 54
PDGFRA p.S566_E571>R 50
PDGFRA p.D842_H845del 35
PDGFRA p.I843_D846delIMHD 24
PDGFRA p.N659K 22
PDGFRA p.I843_D846del 21
PDGFRA p.D842Y 11
PDGFRA p.I843_S847>T 9
PDGFRA p.S566_E571>K 8
PDGFRA p.M844_S847del 7
PDGFRA p.N659Y 5
Update
In my comment on David W's answer I suggest a data structure that was just an array of two-element arrays, instead of an array of single-element hashes.
This is what that would look like. The output is identical to the above code
use strict;
use warnings;
my %matches = (
PDGFRA => [
[ "p.N659K", 22 ],
[ "p.D842Y", 11 ],
[ "p.I843_S847>T", 9 ],
[ "p.D842_H845del", 35 ],
[ "p.I843_D846delIMHD", 24 ],
[ "p.I843_D846del", 21 ],
[ "p.D842V", 457 ],
[ "p.N659Y", 5 ],
[ "p.M844_S847del", 7 ],
[ "p.S566_E571>K", 8 ],
[ "p.S566_E571>R", 50 ],
[ "p.V561D", 54 ],
],
);
while (my ($key, $list) = each %matches) {
my @sorted = sort { $list->[$b][1] <=> $list->[$a][1] } 0 .. $#$list;
for my $item (@$list[@sorted]) {
printf "%s %s %d\n", $key, @$item;
}
}
Upvotes: 3
Reputation: 107040
You've already accepted an answer, but I want to comment on your data structure. You have:
Why the array which contains the hashes with only a single key? Why not simply get rid of the array?
$VAR1 = {
'PDGFRA' => {
'p.N659K' => 22,
'p.D842Y' => 11,
'p.I843_S847>T' => 9,
'p.D842_H845del' => 35,
'p.I843_D846delIMHD' => 24,
'p.I843_D846del' => 21,
'p.D842V' => 457,
'p.N659Y' => 5,
'p.M844_S847del' => 7,
'p.S566_E571>K' => 8,
'p.S566_E571>R' => 50,
'p.V561D' => 54,
};
This would greatly simplify your structure and make it easier to find the values enclosed in the inner most hash. You can access the keys directly.
If the problem is that some of these hash keys may be duplicates, you can make that hash key point to an array of values:
$VAR1 = {
'PDGFRA' => {
'p.N659K' => 22,
'p.D842Y' => 11,
'p.I843_S847>T' => [
6,
9
],
'p.D842_H845del' => 35,
'p.I843_D846delIMHD' => 24,
'p.I843_D846del' => 21,
...
Note that p.I843_S847
contains both 6
and 9
. You could simplify and make each value of the inner hash a reference to an array, and 99% of those arrays may contain a single value, or you could detect with the ref command whether the content is a scalar or a reference to an array. Either way, you still have the benefit of the faster lookup, and easier access to the keys in that hash, so you can sort them.
Since, you are familiar with using complex data structures in Perl, you should also learn about how Object Oriented Perl works. This will make it a lot easier to handle these structures, and will also help you in development because it gives you a clean way of relating to these complex structures.
Upvotes: 2
Reputation: 5171
Assuming $VAR
above is Data::Dumper::Dumper(\%matches)
...
If you don't want to make your data-structure nicer....
for my $gene ( sort keys %matches ) {
for my $hash ( sort {
my ($akey) = keys %$a;
my ($bkey) = keys %$b;
$a->{$akey} <=> $b->{$bkey}
} @{$matches{$gene}} ) {
my ($key) = keys %$hash;
print "$key => $hash->{$key}\n";
}
}
That sorts by the value (e.g: 12) not the key (e.g: 'p.I843_D846del'). I figured since you used a numeric comparison that you'd want to sort by the numeric value ;-)
edited: fixed body of inner loop.
edit 2:
I see you tried a Schwartzian Transform... if you keep your data-structure as is, that might be a more efficient solution... as follows:
for my $gene ( sort keys %matches ) {
print "$_->[0] => $_->[1]\n" for # print both key and value
sort { $a->[1] <=> $b->[1] } # sort by value (e.g: 35)
map { my ($key) = keys %$_; [$key, $_->{$key}] } # e.g:['p.D842_H845del' ,35]
@{$matches{$gene}};
}
But instead, I'd just fix the data structure.
Probably just make both the 'key' (e.g: 'p.I843_D846del') and the 'value' (e.g: 12) both values and give them consistent key names.
Upvotes: 1