Reputation: 733

Sorting a Hash of Array of Hashes by Internal Hash Value

Having a hard time wrapping my head around this, and I'm sure it's just a case of be being dense today. I have a data structure similar to this:

{
  PDGFRA => [
    { "p.N659K" => 22 },
    { "p.D842Y" => 11 },
    { "p.I843_S847>T" => 9 },
    { "p.D842_H845del" => 35 },
    { "p.I843_D846delIMHD" => 24 },
    { "p.I843_D846del" => 21 },
    { "p.D842V" => 457 },
    { "p.N659Y" => 5 },
    { "p.M844_S847del" => 7 },
    { "p.S566_E571>K" => 8 },
    { "p.S566_E571>R" => 50 },
    { "p.V561D" => 54 },
  ],
}

I would like to print out the results, reverse sorted (greatest to smallest) by the hash values, so ultimately I end up with something like this

PDGFRA    p.D842V    457
PDGFRA    p.V561D    54
PDGFRA    p.S566_E571>R    50
PDGFRA    p.D842_H845del    35
.
.
.
etc.

I have no problem printing the data structure as I want, but I can't seem to figure out how to sort the data prior to printing it out. I've tried to sort like this:

for my $gene ( sort keys %matches ) {
    for my $var ( sort { $a->{$var} <=> $b->{$var} } @{$matches{$gene}} {
        print "$var\n";
    }
 }

But whether I use $var or $_ it doesn't seem to work, complaining that '$var' is not defined, and '$_' is not initialized. I also tried (really pathetically!) to use a Schwarzian transform, but I don't think I'm even close on this one:

for my $gene ( sort keys %matches ) {
    my @sorted = 
        map { $_->[0] } 
        sort { $a->[1] <=> $b->[1] } 
        map { [ $_, $matches{$gene}->{$_} ] } @{$matches{$gene}};
}

Would someone mind pointing me in the right direction for sorting a hash or arrays of hashes by internal hash value?

Upvotes: 2

Answers (4)

FMc

Reputation: 42411

Your data structure appears to have unnecessary depth. But here's a Schwartzian Transform version that will sort it as you want.

for my $gene (sort keys %matches) {
    my @sorted = map  { {@$_} }                # Back to hash.
                 sort { $b->[1] <=> $a->[1] }  # Sort.
                 map  { [each %$_] }           # Unpack the 1-key hash.
                 @{$matches{$gene}};
}

Upvotes: 0

Borodin

Reputation: 126722

Here's an alternative. This one works by sorting a list of indices into the array @sorted so that you can then just iterate over an array slice @$list[@sorted].

You're hampered by the data format though, and it's messy just to extract a pair of values given a single-element hash.

I hope this helps.

use strict;
use warnings;

my %matches = (
  PDGFRA => [
    { "p.N659K"            =>  22 },
    { "p.D842Y"            =>  11 },
    { "p.I843_S847>T"      =>   9 },
    { "p.D842_H845del"     =>  35 },
    { "p.I843_D846delIMHD" =>  24 },
    { "p.I843_D846del"     =>  21 },
    { "p.D842V"            => 457 },
    { "p.N659Y"            =>   5 },
    { "p.M844_S847del"     =>   7 },
    { "p.S566_E571>K"      =>   8 },
    { "p.S566_E571>R"      =>  50 },
    { "p.V561D"            =>  54 },
  ],
);

while (my ($key, $list) = each %matches) {

  my @sorted = sort {
    my ($ka, $va) = %{ $list->[$b] };
    my ($kb, $vb) = %{ $list->[$a] };
    $va <=> $vb;
  } 0 .. $#$list;

  for my $item (@$list[@sorted]) {
    printf "%s   %s     %d\n", $key, %$item;
  }
}

output

PDGFRA   p.D842V     457
PDGFRA   p.V561D     54
PDGFRA   p.S566_E571>R     50
PDGFRA   p.D842_H845del     35
PDGFRA   p.I843_D846delIMHD     24
PDGFRA   p.N659K     22
PDGFRA   p.I843_D846del     21
PDGFRA   p.D842Y     11
PDGFRA   p.I843_S847>T     9
PDGFRA   p.S566_E571>K     8
PDGFRA   p.M844_S847del     7
PDGFRA   p.N659Y     5

Update

In my comment on David W's answer I suggest a data structure that was just an array of two-element arrays, instead of an array of single-element hashes.

This is what that would look like. The output is identical to the above code

use strict;
use warnings;

my %matches = (
  PDGFRA => [
    [ "p.N659K",             22 ],
    [ "p.D842Y",             11 ],
    [ "p.I843_S847>T",        9 ],
    [ "p.D842_H845del",      35 ],
    [ "p.I843_D846delIMHD",  24 ],
    [ "p.I843_D846del",      21 ],
    [ "p.D842V",            457 ],
    [ "p.N659Y",              5 ],
    [ "p.M844_S847del",       7 ],
    [ "p.S566_E571>K",        8 ],
    [ "p.S566_E571>R",       50 ],
    [ "p.V561D",             54 ],
  ],
);

while (my ($key, $list) = each %matches) {

  my @sorted = sort { $list->[$b][1] <=> $list->[$a][1] } 0 .. $#$list;

  for my $item (@$list[@sorted]) {
    printf "%s   %s     %d\n", $key, @$item;
  }
}

Upvotes: 3

David W.

Reputation: 107040

You've already accepted an answer, but I want to comment on your data structure. You have:

A hash
That hash contains an array.
These arrays contain a hash that only has a single key in it.

Why the array which contains the hashes with only a single key? Why not simply get rid of the array?

$VAR1 = {
      'PDGFRA' =>  {
                      'p.N659K' => 22,
                      'p.D842Y' => 11,
                      'p.I843_S847>T' => 9,
                      'p.D842_H845del' => 35,
                      'p.I843_D846delIMHD' => 24,
                      'p.I843_D846del' => 21,
                      'p.D842V' => 457,
                      'p.N659Y' => 5,
                      'p.M844_S847del' => 7,
                      'p.S566_E571>K' => 8,
                      'p.S566_E571>R' => 50,
                      'p.V561D' => 54,
     };

This would greatly simplify your structure and make it easier to find the values enclosed in the inner most hash. You can access the keys directly.

If the problem is that some of these hash keys may be duplicates, you can make that hash key point to an array of values:

$VAR1 = {
      'PDGFRA' =>  {
                      'p.N659K' => 22,
                      'p.D842Y' => 11,
                      'p.I843_S847>T' => [
                                            6,
                                            9
                                          ],
                      'p.D842_H845del' => 35,
                      'p.I843_D846delIMHD' => 24,
                      'p.I843_D846del' => 21,
                       ...

Note that p.I843_S847 contains both 6 and 9. You could simplify and make each value of the inner hash a reference to an array, and 99% of those arrays may contain a single value, or you could detect with the ref command whether the content is a scalar or a reference to an array. Either way, you still have the benefit of the faster lookup, and easier access to the keys in that hash, so you can sort them.

Since, you are familiar with using complex data structures in Perl, you should also learn about how Object Oriented Perl works. This will make it a lot easier to handle these structures, and will also help you in development because it gives you a clean way of relating to these complex structures.

Upvotes: 2

David-SkyMesh

Reputation: 5171

Assuming $VAR above is Data::Dumper::Dumper(\%matches) ...

If you don't want to make your data-structure nicer....

for my $gene ( sort keys %matches ) {
    for my $hash ( sort { 
                      my ($akey) = keys %$a;
                      my ($bkey) = keys %$b;
                      $a->{$akey} <=> $b->{$bkey} 
                   } @{$matches{$gene}} ) {
        my ($key) = keys %$hash;
        print "$key => $hash->{$key}\n";
    }
}

That sorts by the value (e.g: 12) not the key (e.g: 'p.I843_D846del'). I figured since you used a numeric comparison that you'd want to sort by the numeric value ;-)

edited: fixed body of inner loop.

edit 2:

I see you tried a Schwartzian Transform... if you keep your data-structure as is, that might be a more efficient solution... as follows:

for my $gene ( sort keys %matches ) {
    print "$_->[0] => $_->[1]\n" for                     # print both key and value
        sort { $a->[1] <=> $b->[1] }                     # sort by value (e.g: 35)
        map { my ($key) = keys %$_; [$key, $_->{$key}] } # e.g:['p.D842_H845del' ,35]
        @{$matches{$gene}};
}

But instead, I'd just fix the data structure.

Probably just make both the 'key' (e.g: 'p.I843_D846del') and the 'value' (e.g: 12) both values and give them consistent key names.

Upvotes: 1

Sorting a Hash of Array of Hashes by Internal Hash Value

Answers (4)

Related Questions