Nick Messick
Nick Messick

Reputation: 3212

Using Perl, how can I sort an array using the value of a number inside each array element?

Let's say I have an array, @theArr, which holds 1,000 or so elements such as the following:

01  '12 16 sj.1012804p1012831.93.gz'
02  '12 16 sj.1012832p1012859.94.gz'
03  '12 16 sj.1012860p1012887.95.gz'
04  '12 16 sj.1012888p1012915.96.gz'
05  '12 16 sj.1012916p1012943.97.gz'
06  '12 16 sj.875352p875407.01.gz'
07  '12 16 sj.875408p875435.02.gz'
08  '12 16 sj.875436p875535.03.gz'
09  '12 16 sj.875536p875575.04.gz'
10  '12 16 sj.875576p875603.05.gz'
11  '12 16 sj.875604p875631.06.gz'
12  '12 16 sj.875632p875659.07.gz'
13  '12 16 sj.875660p875687.08.gz'
14  '12 16 sj.875688p875715.09.gz'
15  '12 16 sj.875716p875743.10.gz'
...

If my first set of numbers (between the 'sj.' and the 'p') was always 6 digits, I wouldn't have a problem. But, when the numbers roll over into 7 digits the default sort stops working as the larger 7 digit numbers comes before the smaller 6 digit number.

Is there a way to tell Perl to sort by that number inside the string in each array element?

Upvotes: 4

Views: 5034

Answers (4)

Matt K
Matt K

Reputation: 13842

You can use a regex to pull the number out of every line inside the block you pass to the sort function:

@newArray = sort { my ($anum,$bnum); $a =~ /sj\.([0-9]+)p/; $anum = $1; $b =~ /sj\.(\d+)p/; $bnum = $1; $anum <=> $bnum } @theArr;

However, Chas. Owens's solution is better, since it only does the regex matches once for every element.

Upvotes: 3

Plate
Plate

Reputation: 81

Here's an example that sorts them ascending, assuming you don't care too much about efficiency:

use strict;

my @theArr = split(/\n/, <<END_SAMPLE);
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
END_SAMPLE

my @sortedArr = sort compareBySJ @theArr;

print "Before:\n".join("\n", @theArr)."\n";
print "After:\n".join("\n", @sortedArr)."\n";

sub compareBySJ {
    # Capture the values to compare, against the expected format
    # NOTE: This could be inefficient for large, unsorted arrays
    #       since you'll be matching the same strings repeatedly
    my ($aVal) = $a =~ /^\d+\s+\d+\s+sj\.(\d+)p/
        or die "Couldn't match against value $a";
    my ($bVal) = $b =~ /^\d+\s+\d+\s+sj\.(\d+)p/
        or die "Couldn't match against value $a";

    # Return the numerical comparison of the values (ascending order)
    return $aVal <=> $bVal;
}

Outputs:

Before:
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
After:
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz

Upvotes: 2

Chas. Owens
Chas. Owens

Reputation: 64909

Looks like you need a Schwartzian Transform:

#!/usr/bin/perl

use strict;
use warnings;

my @a = <DATA>;

print 
    map  { $_->[1] }                #get the original value back
    sort { $a->[0] <=> $b->[0] }    #sort arrayrefs numerically on the sort value
    map  { /sj\.(.*?)p/; [$1, $_] } #build arrayref of the sort value and orig
    @a;

__DATA__
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
12 16 sj.875604p875631.06.gz
12 16 sj.875632p875659.07.gz
12 16 sj.875660p875687.08.gz
12 16 sj.875688p875715.09.gz
12 16 sj.875716p875743.10.gz

Upvotes: 18

RBerteig
RBerteig

Reputation: 43286

Yes. The sort function takes an optional comparison function which will be used to compare two elements. It can take the form of either a block of code, or the name of a function to call.

There is an example at the linked document that is similar to what you want to do:

# inefficiently sort by descending numeric compare using
# the first integer after the first = sign, or the
# whole record case-insensitively otherwise

@new = sort {
($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0]
            ||
            uc($a)  cmp  uc($b)
} @old;

Upvotes: 1

Related Questions