SSN
SSN

Reputation: 886

Sorting a hash where keys contain non-alphanumeric characters

I have a hash like the following:

my %hash=( '(293 to 296)'   => 2,
           '(3118 to 3121)' => 2,
           '(330 to 333)'   => 2,
           '(2126 to 2129)' => 2,
           '(1999 to 2002)' => 2,
           '(2138 to 2141)' => 9,
           '(771 to 774)'   => 4,
           '(2016 to 2019)' => 1,
           '(888 to 891)'   => 5,
           '(3102 to 3105)' => 1,
        );

I want to sort my hash using keys, where keys contains brackets. I have tried the following code,

foreach $key(sort {$b <=> $a} keys %hash)
{
    print $key;
}

and I got the following, which is not numerically sorted:

(888 to 891)(2016 to 2019)(293 to 296)(3118 to 3121)(3102 to 3105)(330 to 333)(1999 to 2002)(2126 to 2129)(2138 to 2141)(771 to 774)

I expect an output, which is numerically sorted as below. Please suggest me a way to achieve the following:

(293 to 296)
(330 to 333)
(771 to 774)
(888 to 891)
(1999 to 2002)
(2016 to 2019)
(2126 to 2129)
(2138 to 2141)
(3102 to 3105)
(3118 to 3121)                             

Upvotes: 3

Views: 590

Answers (5)

Borodin
Borodin

Reputation: 126732

The issue is that a string like (293 to 296) has no numerical value. If you had use warnings 'all' in place as you should, you would have seen multiple warnings like

Argument "(293 to 296)" isn't numeric in sort

and every key evaluates to zero so they are all equal as far as sort is concerned

So you have to extract a number from each value to be used in a numerical sort. I would just snag the lower limit of each range and sort by that.

use strict;
use warnings 'all';
use feature 'say';

my %hash = (
    '(293 to 296)'   => 2,
    '(3118 to 3121)' => 2,
    '(330 to 333)'   => 2,
    '(2126 to 2129)' => 2,
    '(1999 to 2002)' => 2,
    '(2138 to 2141)' => 9,
    '(771 to 774)'   => 4,
    '(2016 to 2019)' => 1,
    '(888 to 891)'   => 5,
    '(3102 to 3105)' => 1,
);

my @keys = sort {
  my ($aa, $bb) = map /(\d+)/, $a, $b;
  $aa <=> $bb;
} keys %hash;

say for @keys;

output

(293 to 296)
(330 to 333)
(771 to 774)
(888 to 891)
(1999 to 2002)
(2016 to 2019)
(2126 to 2129)
(2138 to 2141)
(3102 to 3105)
(3118 to 3121)

This could be made even more concise by using the nsort_by function from List::MoreUtils or List::UtilsBy like this

use List::MoreUtils 'nsort_by';

say for nsort_by { /(\d+)/ and $1 } keys %hash;

The output from this code is identical to that of the above

Upvotes: 2

mkHun
mkHun

Reputation: 5927

Try this

In below script i used pattern matching to remove the ( ) with /r flag. It helps to hold the original data from the substitution. Then it will sort numerically.

my %hash=( '(293 to 296)'   => 2,
           '(3118 to 3121)' => 2,
           '(330 to 333)'   => 2,
           '(2126 to 2129)' => 2,
           '(1999 to 2002)' => 2,
           '(2138 to 2141)' => 9,
           '(771 to 774)'   => 4,
           '(2016 to 2019)' => 1,
           '(888 to 891)'   => 5,
           '(3102 to 3105)' => 1,
        );



foreach my $i (sort { $a=~s/\(//rg <=> $b=~s/\(//rg }  keys %hash)
{
    print "$i\n";

}

Upvotes: 1

G. Cito
G. Cito

Reputation: 6378

You could use one of the CPAN modules that "naturally" sorts values (e.g. you could use Sort::Naturally).

This would sort of hide what is going on though. So for educational purposes I like @Sobrique, @Borodin and @Quentin's explanations.

use Sort::Naturally;
my @nsorted ;
@nsorted = nsort ( <DATA> ) ;
print @nsorted;

__DATA__
(293 to 296)
(3118 to 3121)
(330 to 333)
(2126 to 2129)
(1999 to 2002)
(2138 to 2141)
(771 to 774)
(2016 to 2019)
(888 to 891)
(3102 to 3105)

Output:

(293 to 296)
(330 to 333)
(771 to 774)
(888 to 891)
(1999 to 2002)
(2016 to 2019)
(2126 to 2129)
(2138 to 2141)
(3102 to 3105)
(3118 to 3121) 

Upvotes: 0

Sobrique
Sobrique

Reputation: 53488

sort works by passing $a and $b into a function, and returning -1, 0 or +1.

The simplest - sorting on the first number - would go like this:

sort { $a =~ s/.(\d+).*/$1/r <=> $b =~ s/.*(\d+).*/$1/r } keys %hash

This extracts the first numeric value from each key, compares and returns that comparison value.

Of course, if your ranges overlap, this won't work the way you want - you'll have to get a bit more complicated - if you have:

100 to 200 150 to 180 120 to 205

How should they be sorted? Either way though - you write a subroutine that 'works' on $a and $b and performs the comparison. A useful trick here is that the 'standard' sort operators - <=> and cmp - return zero, and thus can be shortcutted with ||.

So:

sub compare_numbers {
   my @a = $a =~ m/(\d+)/g;
   my @b = $b =~ m/(\d+)/g; 
   return ( $a[0] <=> $b[0] 
         || $a[1] <=> $b[1] )
}

If the first comparison is zero, then the second is evaluated.

Or you can calculate the intermediate value:

sub compare_numbers {
   my @a = $a =~ m/(\d+)/g;
   my @b = $b =~ m/(\d+)/g; 
   return ( ($a[1] - $a[0] / 2 + $a[0]) <=> ($b[1] - $b[0] / 2 + $b[0])
}

You would use either of these in a similar way to above:

sort compare_numbers keys %hash 

Upvotes: 4

Quentin
Quentin

Reputation: 943649

'(293 to 296)' isn't a number (and doesn't even begin with a number) so trying to sort it numerically doesn't make any sense.

You could extract the first number from it and sort on that.

($a) = ($a =~ /(\d+)/);
($b) = ($b =~ /(\d+)/);

Upvotes: 2

Related Questions