aliocee
aliocee

Reputation: 730

In Perl, how can I sort records using numbers that appear before the first colon (:)?

I have the following sequence of data in Perl:

143:0.0209090909090909 
270:0.0909090909090909 
32:0.0779090909090909 
326:0.3009090909090909

Please, how can I sort them based on the numbers before the colon, to get this as my output?

32:0.0779090909090909
143:0.0209090909090909
270:0.0909090909090909  
326:0.3009090909090909

Upvotes: 2

Views: 216

Answers (6)

Sinan Ünür
Sinan Ünür

Reputation: 118156

Given the variety, I figured some benchmarks might be appropriate. Note, please double-check the benchmarking code before trusting these numbers: I whipped the script up in a hurry.

#!/usr/bin/env perl

use 5.012;
use strict;
use warnings;

use Benchmark qw( cmpthese );

use constant DATA_SIZE => 1000;

cmpthese( -1, {
    right_thing   => sub { do_the_right_thing ( make_data(rt => DATA_SIZE) ) },
    re_extract    => sub { re_extract         ( make_data(re => DATA_SIZE) ) },
    split_extract => sub { split_extract      ( make_data(se => DATA_SIZE) ) },
    schxfrom_re   => sub { schxform_re        ( make_data(sx => DATA_SIZE) ) },
    nop           => sub { nop                ( make_data(nl => DATA_SIZE) ) },
});

sub do_the_right_thing {
    my ($DATA) = @_;
    no warnings 'numeric';
    [ sort { $a <=> $b } @$DATA ];
}

sub re_extract {
    my ($DATA) = @_;
    my $re = qr/^([0-9]+):/;
    [ sort { ($a =~ $re)[0] <=> ($b =~ $re)[0] } @$DATA ];
}

sub split_extract {
    my ($DATA) = @_;
    [
        sort {
            my ($x, $y) = map split(/:/, $_, 2), $a, $b;
            $x <=> $y
        } @$DATA
    ];
}

sub schxform_re {
    my ($DATA) = @_;
    [
        map    $_->[0],
        sort { $a->[1] <=> $b->[1] }
        map  { [ $_, m/^([0-9]+):/ ] } @$DATA
    ];
}

sub nop {
    my ($DATA) = @_;
    [ @$DATA ];
}

sub make_data {
    state %cache;
    my ($k, $n) = @_;

    unless (exists $cache{$k}) {
        $cache{ $k } =  [
            map
            sprintf('%d:%f', int(rand 10_000), rand),
            1 .. $n
        ];
    }

    return $cache{ $k };
}

Results

                Rate re_extract schxfrom_re split_extract right_thing        nop
re_extract    32.1/s         --        -85%          -92%        -98%       -99%
schxfrom_re    213/s       565%          --          -46%        -87%       -94%
split_extract  392/s      1121%         84%            --        -76%       -89%
right_thing   1614/s      4933%        657%          312%          --       -53%
nop           3459/s     10685%       1522%          783%        114%         --

Upvotes: 0

flesk
flesk

Reputation: 7579

What, no Schwhartzian transform yet?

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my @data = qw(143:0.0209090909090909
            270:0.0909090909090909 
            32:0.0779090909090909 
            326:0.3009090909090909);

my @sorted = map $_->[0], sort {$a->[1] <=> $b->[1]} map {[$_, m/^(.+):/]} @data;

print Dumper \@sorted;

Output:

$VAR1 = [
          '32:0.0779090909090909',
          '143:0.0209090909090909',
          '270:0.0909090909090909',
          '326:0.3009090909090909'
        ];

Upvotes: 1

Alan Haggai Alavi
Alan Haggai Alavi

Reputation: 74272

The built-in sort function can be used:

Program

#!/usr/bin/env perl

use strict;
use warnings;

my @data = qw(
  143:0.0209090909090909
  270:0.0909090909090909
  32:0.0779090909090909
  326:0.3009090909090909
);

my $match = qr/^(\d+):/;
@data = sort { ( $a =~ $match )[0] <=> ( $b =~ $match )[0] } @data;

print join( "\n", @data ), "\n";

Output

32:0.0779090909090909
143:0.0209090909090909
270:0.0909090909090909
326:0.3009090909090909

Upvotes: 2

tadmc
tadmc

Reputation: 3744

It does not matter that there are colons there.

Perl's rules for converting strings to numbers will just do The Right Thing:

#!/usr/bin/perl
use warnings;
use strict;

my @nums = qw(
    143:0.0209090909090909 
    270:0.0909090909090909 
    32:0.0779090909090909 
    326:0.3009090909090909
);

{ no warnings 'numeric';
    @nums = sort {$a <=> $b} @nums;
}

print "$_\n" for @nums;

Upvotes: 7

sehe
sehe

Reputation: 393684

I'd simply use

sort -n < input.txt

Otherwise:

use strict;
use warnings;

my @lines = (<>); 
print for sort { 
    my @aa = split(/:/, $a); 
    my @bb = split(/:/, $b); 
    1*$aa[0] <=> 1*$bb[0] 
} @lines;

Upvotes: 1

RET
RET

Reputation: 9188

Something along the lines of:

my @sorted = sort { my ($a1) = split(/:/,$a);
                    my ($b1) = split(/:/,$b);
                    $a1 <=> $b1 } @data ;

$a1 and $b1 will be the first element of each of the sorting inputs, split on the colon character.

Upvotes: 1

Related Questions