Reputation: 73
I have an array with many fields in each line spaced by different spacing like:
INDDUMMY drawing2 139 30 1 0 0 0 0 0
RMDUMMY drawing2 69 2 1 0 0 0 0 0
PIMP drawing 7 0 1444 718 437 0 0 0
I'm trying to make sorting for this array by number in 3rd field so the desired output should be:
PIMP drawing 7 0 1444 718 437 0 0 0
RMDUMMY drawing2 69 2 1 0 0 0 0 0
INDDUMMY drawing2 139 30 1 0 0 0 0 0
I tried to make a split using regular expression within the sorting function like:
@sortedListOfLayers = sort {
split(m/\w+\s+(\d+)\s/gm,$a)
cmp
split(m/\w+\s+(\d+)\s/gm,$b)
}@listOfLayers;
but it doesn't work correctly. How I could make that type of sorting?
Upvotes: 1
Views: 434
Reputation: 52112
If you want to sort a list and the operation used in the sort
block is expensive, an often used Perl idiom is the Schwartzian Transform: you apply the operation once to each list element and store the result alongside the original element, sort, then map back to your original format.
The classic textbook example is sorting files in a directory by size using the expensive -s
file test. A naïve approach would be
my @sorted = sort { -s $a <=> -s $b } @unsorted;
which has to perform -s
twice for each comparison operation.
Using the Schwartzian Transform, we map the file names into a list of array references, each referencing an array containing the list element and its size (which has to be determined only once per file), then sort by file size, and finally map the array references back to just the file names. This is all done in a single step:
my @sorted =
map $_->[0], # 3. map to file name
sort { a$->[1] <=> b$->[1] } # 2. sort by size
map [ $_, -s $_ ], # 1. evaluate size once for each file
@unsorted;
In your case, the question is how expensive it is to extract the third field of each array element. When in doubt, measure to compare different methods. The speedup in the file size example is dramatic at about a factor 10 for a few dozen files!
The Schwartzian Transform applied to your problem would look something like this:
my @sorted =
map $_->[0], # 3. Map to original array
sort { $a->[1] <=> $b->[1] } # 2. Sort by third column
map [ $_, ( split( ' ', $_ ) )[2] ], # 1. Use Sobrique's idea
@array;
If the operation used is so expensive that you want to avoid performing it more than once per value in case you have identical array elements, you can cache the results as outlined in this question; this is known as the Orcish Maneuver.
Upvotes: 0
Reputation: 53478
You need to expand out your sort function a little further. I'm also not sure that split
is working the way you think it is. Split turns text into an array based on a delimiter.
I think your problem is that your regular expression - thanks to the gm
flags - isn't matching what you think it's matching. I'd perhaps approach it slightly differently though:
#!/usr/bin/perl
use strict;
use warnings;
my @array = <DATA>;
sub sort_third_num {
my $a1 = (split ( ' ', $a ) )[2];
my $b1 = (split ( ' ', $b )) [2];
return $a1 <=> $b1;
}
print sort sort_third_num @array;
__DATA__
NDDUMMY drawing2 139 30 1 0 0 0 0 0
RMDUMMY drawing2 69 2 1 0 0 0 0 0
PIMP drawing 7 0 1444 718 437 0 0 0
This does the trick, for example.
If you're set on doing a regex approach:
sub sort_third_num {
my ($a1) = $a =~ m/\s(\d+)/;
my ($b1) = $b =~ m/\s(\d+)/;
return $a1 <=> $b1;
}
not globally matching means only the first element is returned. And only the first match of 'whitespace-digits' is returned. We also compare numerically, rather than stringwise.
Upvotes: 1