Soumya
Soumya

Reputation: 893

Parametric sorting of a list of hashes

My aim is to write a subroutine which takes in

  1. An array of hashes
  2. A list containing the sort order

Just to be clear - the keys may be anything. My example is just for reference.


Given an array containing a list of keys in their required sort order

my @aSortOrder = ( 'DELTA1_2', 'SET1', 'SET2' );

My idea is to form a string

$a->{DELTA1_2} <=> $b->{DELTA1_2} or $a->{SET1} <=> $b->{SET1} or $a->{SET2} <=> $b->{SET2}

and then execute it with eval.

Here's my code

my $paRecords = [
    { 'SET1' => 48265, 'DELTA1_2' => -1,  'SET2' => 48264 },
    { 'SET1' => 8328,  'DELTA1_2' => -29, 'SET2' => 8299 },
    { 'SET1' => 20,    'DELTA1_2' => 0,   'SET2' => 0 },
    { 'SET1' => 10,    'DELTA1_2' => 0,   'SET2' => 0 }
];
my @aSortOrder = ( 'DELTA1_2', 'SET1', 'SET2' );
my $pStr = '';

foreach ( @aSortOrder ) {
    $pStr = $pStr . ' or $a->{' . $_ . '} <=> $b->{' . $_ . '}';
}

$pStr =~ s/^\s*or\s*//;

my @aSorted = sort { eval "$pStr"; } @$paRecords;

print Dumper \@aSorted;

output

$VAR1 = [
          {
            'SET1' => 8328,
            'SET2' => 8299,
            'DELTA1_2' => -29
          },
          {
            'SET1' => 48265,
            'SET2' => 48264,
            'DELTA1_2' => -1
          },
          {
            'SET2' => 0,
            'DELTA1_2' => 0,
            'SET1' => 10
          },
          {
            'SET2' => 0,
            'DELTA1_2' => 0,
            'SET1' => 20
          }
        ];

I guess that this is far from the ideal approach to solving the problem, so any pointer on how this problem could be better solved would be a great help.

Upvotes: 0

Views: 83

Answers (3)

Borodin
Borodin

Reputation: 126722

The block passed to sort may contain any amount of code. It is required only to evaluate to a negative number, zero, or a positive number according to whether $a should be considered to be less than, equal to, or great than $b

I agree with your decision to bundle this into a subroutine, so I have written sort_hashes_by_keys, which expects a reference to an array of hashes to be sorted, and a reference to an array of key strings. It returns a list of hashes sorted according to the list of keys

use strict;
use warnings 'all';

use Data::Dump 'dd';

my $records =  [
    { SET1 => 48265, DELTA1_2 => -1,  SET2 => 48264 },
    { SET1 => 8328,  DELTA1_2 => -29, SET2 => 8299  },
    { SET1 => 20,    DELTA1_2 => 0,   SET2 => 0     },
    { SET1 => 10,    DELTA1_2 => 0,   SET2 => 0     }
];

my @sort_order = qw/ DELTA1_2 SET1 SET2 /;

my @sorted = sort_hashes_by_keys( $records, \@sort_order );

dd \@sorted;



sub sort_hashes_by_keys {
    my ( $hashes, $order ) = @_;

    sort {

        my $cmp = 0;

        for my $key ( @$order ) {
            last if $cmp = $a->{$key} <=> $b->{$key};
        }

        $cmp;

    } @$hashes;
}

output

[
  { DELTA1_2 => -29, SET1 => 8328, SET2 => 8299 },
  { DELTA1_2 => -1, SET1 => 48265, SET2 => 48264 },
  { DELTA1_2 => 0, SET1 => 10, SET2 => 0 },
  { DELTA1_2 => 0, SET1 => 20, SET2 => 0 },
]


Note that I strongly advise against both hungarian notation and camel case when naming your variables. Perl is not strictly typed, and it has sigils like $, @ and % which indicate the type of every variable, so hungarian notation is superfluous at best, and also adds distracting and irrelevant noise. Also, by convention, capital letters are reserved for module names and global variables, so local identifiers should be in "snake case", i.e. lower-case letters and underscores. Many non-English speakers also find camel case difficult to parse

Upvotes: 1

ikegami
ikegami

Reputation: 385657

Just create a sub that does the comparison.

sub custom_cmp {
   my $keys = shift;
   for my $key (@$keys) {
      my $cmp = $_[0]{$key} <=> $_[1]{$key};
      return $cmp if $cmp;
   }

   return 0;
}

my @aSorted = sort { custom_cmp(\@aSortOrder, $a, $b) } @$paRecords;

The above makes two sub calls for each comparison. If we generate the compare function, we can reduce that to one.

sub make_custom_cmp {
   my @keys = @_;
   return sub($$) {
      for my $key (@keys) {
         my $cmp = $_[0]{$key} <=> $_[1]{$key};
         return $cmp if $cmp;
      }

      return 0;
   };
}

my $cmp = make_custom_cmp(@aSortOrder);

my @aSorted = sort $cmp @$paRecords;

We could go one further and flatten the loop through code generation. This is what a "proper" eval-based solution would look like. However, this level of optimization is hardly needed.

sub make_custom_cmp {
   my @keys = @_;
   my @cmps;
   for $i (0..$#keys) {
      push @cmps, "\$_[0]{\$keys[$i]} <=> \$_[1]{\$keys[$i]}"
   }

   return eval("sub($$) { ".( join(" || ", @cmps) )."}");
}

my $cmp = make_custom_cmp(@aSortOrder);

my @aSorted = sort $cmp @$paRecords;

In fact, the following is probably the most performant solution:

my @aSorted =
   map $paRecords->[ unpack('N', substr($_, -4))-0x7FFFFFFF ],
      sort
         map pack('N*', map $_+0x7FFFFFFF, @{ $paRecords->[$_] }{@aSortOrder}, $_),
            0..$#$paRecords;

Upvotes: 3

Sobrique
Sobrique

Reputation: 53478

Well, you're quite right - using eval like that is a road to future pain.

The joy of 'sort' is that you can define a sort subroutine, that implicitly defines $a and $b and you can use whatever logic you desire to decide if it's a positive, negative or 'zero' comparison (equal). (e.g. like <=> or cmp do).

The trick here is - 'true' is anything non zero, so <=> you can test for 'true' to see if there's a comparison to be made ( 4 <=> 4 is 'false')

So if you're just working numerically (you'd need to test for 'alphanumeric' and use cmp in some cases there, but doesn't seem to apply to your data):

#!/usr/bin/env perl
use strict;
use warnings;

my $paRecords = [
   { 'SET1' => 48265, 'DELTA1_2' => -1,  'SET2' => 48264 },
   { 'SET1' => 8328,  'DELTA1_2' => -29, 'SET2' => 8299 },
   { 'SET1' => 20,    'DELTA1_2' => 0,   'SET2' => 0 },
   { 'SET1' => 10,    'DELTA1_2' => 0,   'SET2' => 0 }
];

#qw is 'quote-words' and just lets you space delimit terms. 
#it's semantically the same as ( 'DELTA1_2', 'SET1', 'SET2' );
my @order = qw ( DELTA1_2 SET1 SET2 );

#note - needs to come after definition of `@order` but it can be re-written later as long as it's in scope. 
#you can pass an order explicitly into the subroutine if you want though. 
sub order_by {
   for my $key (@order) {
      #compare key
      my $result = $a->{$key} <=> $b->{$key};
      #return it and exit the loop if they aren't equal, otherwise 
      #continue iterating sort terms. 
      return $result if $result;
   }
   return 0; #all keys were similar, therefore return zero.
}

print join (",", @order), "\n";
foreach my $record ( sort {order_by} @$paRecords ) {
   #use hash slice to order output in 'sort order'. 
   #optional, but hopefully clarifies what's going on. 
   print join (",", @{$record}{@order}), "\n";
}

This, given your data outputs:

DELTA1_2,SET1,SET2
-29,8328,8299
-1,48265,48264
0,10,0
0,20,0

Note, I've opted to use hash slice for your output, because otherwise hashes are unordered, and so your Dumper output will be inconsistent (randomly ordered fields).

If you need to be a little more dynamic about your ordering, you can pass it into the sort-sub:

#!/usr/bin/env perl
use strict;
use warnings;

sub order_by {
   for my $key (@_) {
      #compare key
      my $result = $a->{$key} <=> $b->{$key};

      #return it and exit the loop if they aren't equal, otherwise
      #continue iterating sort terms.
      return $result if $result;
   }
   return 0;    #all keys were similar, therefore return zero.
}

my $paRecords = [
   { 'SET1' => 48265, 'DELTA1_2' => -1,  'SET2' => 48264 },
   { 'SET1' => 8328,  'DELTA1_2' => -29, 'SET2' => 8299 },
   { 'SET1' => 20,    'DELTA1_2' => 0,   'SET2' => 0 },
   { 'SET1' => 10,    'DELTA1_2' => 0,   'SET2' => 0 }
];

#qw is 'quote-words' and just lets you space delimit terms.
#it's semantically the same as ( 'DELTA1_2', 'SET1', 'SET2' );
my @order = qw ( DELTA1_2 SET1 SET2 );

print join( ",", @order ), "\n";
foreach my $record ( sort {order_by ( @order ) } @$paRecords ) {

   #use hash slice to order output in 'sort order'.
   #optional, but hopefully clarifies what's going on.
   print join( ",", @{$record}{@order} ), "\n";
}

Upvotes: 0

Related Questions