eslam saad
eslam saad

Reputation: 73

Perl: Sort part of array

I have an array with many fields in each line spaced by different spacing like:

INDDUMMY   drawing2   139       30        1        0        0        0        0        0
RMDUMMY    drawing2   69        2         1        0        0        0        0        0  
PIMP       drawing    7         0         1444     718      437      0        0        0

I'm trying to make sorting for this array by number in 3rd field so the desired output should be:

PIMP       drawing    7         0         1444     718      437      0        0        0
RMDUMMY    drawing2   69        2         1        0        0        0        0        0
INDDUMMY   drawing2   139       30        1        0        0        0        0        0

I tried to make a split using regular expression within the sorting function like:

@sortedListOfLayers = sort {
    split(m/\w+\s+(\d+)\s/gm,$a)
    cmp
    split(m/\w+\s+(\d+)\s/gm,$b)
}@listOfLayers;

but it doesn't work correctly. How I could make that type of sorting?

Upvotes: 1

Views: 434

Answers (2)

Benjamin W.
Benjamin W.

Reputation: 52112

If you want to sort a list and the operation used in the sort block is expensive, an often used Perl idiom is the Schwartzian Transform: you apply the operation once to each list element and store the result alongside the original element, sort, then map back to your original format.

The classic textbook example is sorting files in a directory by size using the expensive -s file test. A naïve approach would be

my @sorted = sort { -s $a <=> -s $b } @unsorted;

which has to perform -s twice for each comparison operation.

Using the Schwartzian Transform, we map the file names into a list of array references, each referencing an array containing the list element and its size (which has to be determined only once per file), then sort by file size, and finally map the array references back to just the file names. This is all done in a single step:

my @sorted =
    map $_->[0],                 # 3. map to file name
    sort { a$->[1] <=> b$->[1] } # 2. sort by size
    map [ $_, -s $_ ],           # 1. evaluate size once for each file
    @unsorted;

In your case, the question is how expensive it is to extract the third field of each array element. When in doubt, measure to compare different methods. The speedup in the file size example is dramatic at about a factor 10 for a few dozen files!

The Schwartzian Transform applied to your problem would look something like this:

my @sorted =
    map $_->[0],                         # 3. Map to original array
    sort { $a->[1] <=> $b->[1] }         # 2. Sort by third column
    map [ $_, ( split( ' ', $_ ) )[2] ], # 1. Use Sobrique's idea
    @array;

If the operation used is so expensive that you want to avoid performing it more than once per value in case you have identical array elements, you can cache the results as outlined in this question; this is known as the Orcish Maneuver.

Upvotes: 0

Sobrique
Sobrique

Reputation: 53478

You need to expand out your sort function a little further. I'm also not sure that split is working the way you think it is. Split turns text into an array based on a delimiter.

I think your problem is that your regular expression - thanks to the gm flags - isn't matching what you think it's matching. I'd perhaps approach it slightly differently though:

#!/usr/bin/perl
use strict;
use warnings;

my @array = <DATA>;

sub sort_third_num {
   my $a1 = (split ( ' ', $a ) )[2];
   my $b1 = (split ( ' ', $b )) [2];
   return $a1 <=> $b1;
}

print sort sort_third_num @array;

__DATA__
NDDUMMY   drawing2   139       30        1        0        0        0        0        0
RMDUMMY    drawing2   69        2         1        0        0        0        0        0  
PIMP       drawing    7         0         1444     718      437      0        0        0

This does the trick, for example.

If you're set on doing a regex approach:

sub sort_third_num {
    my ($a1) = $a =~ m/\s(\d+)/;
    my ($b1) = $b =~ m/\s(\d+)/;
    return $a1 <=> $b1;
}

not globally matching means only the first element is returned. And only the first match of 'whitespace-digits' is returned. We also compare numerically, rather than stringwise.

Upvotes: 1

Related Questions