MyFirstName MyLastName
MyFirstName MyLastName

Reputation: 207

Sort a Huge file

I want to sort a huge file of approx 20M rows:

so I can get the highest scorers per team.

I want to be considerate of system's resources. So...

  1. Is there a way to do this without putting all the data into a hash/array in Perl?
  2. Can we do this using the Unix/Linux sort utility?

If so, can you please show how to do it?

My input-file will be about 20M rows in the following format

Chicago Bulls|Michael Jordan|38
LA Lakers|Kobe Bryant|32
Chicago Bulls|Steve Kerr|16
LA Lakers|Paul Gasol|20
LA Lakers|Shaquile ONeal|19
Chicago Bulls|Scottie Pippen|23
.
.
.

Upvotes: 3

Views: 202

Answers (3)

dwarring
dwarring

Reputation: 4883

You don't need to sort.

 #!/usr/bin/perl
use warnings; use strict;
my %high_score;

while (<DATA>) {
    chomp;
    my ($team_name, $player, $score) = split(/\|/);
    for ($high_score{$team_name}{$player}) {
        $_ = $score
            unless $_ && $_ > $score
    }
}

for my $team_name (sort keys %high_score) {
    my %team_scores = %{ $high_score{$team_name} };
    my @top_players = sort { $team_scores{$b} <=>  $team_scores{$a} } (keys %team_scores);

    my $n = 0;
    for my $player (@top_players) {
        print "$team_name, $player high score: $team_scores{$player}\n";
        last if ++$n >= 2;
    }
}

__DATA__
Chicago Bulls|Michael Jordan|38
Chicago Bulls|Scottie Pippen|23
Chicago Bulls|Poor Joe|10
Chicago Bulls|Steve Kerr|16
LA Lakers|Kobe Bryant|32
LA Lakers|Paul Gasol|20
LA Lakers|Shaquile ONeal|19

Edits: (1) updated requirements (2) s/while/for/

Upvotes: 3

Birei
Birei

Reputation: 36272

I don't know if sort will break with such a big file, but you can try following command. It separates fields with pipe, then sort by first field and by third field numerically in reverse order, (-r), descendant:

sort -t'|' -k1,1 -k3,3nr infile

It yields:

Chicago Bulls|Michael Jordan|38
Chicago Bulls|Scottie Pippen|23
Chicago Bulls|Steve Kerr|16
LA Lakers|Kobe Bryant|32
LA Lakers|Paul Gasol|20
LA Lakers|Shaquile ONeal|19

Upvotes: 2

Kevin
Kevin

Reputation: 56129

I don't think you can tell sort to sort ascending in one column and descending in another. However, you can use two sorts in a pipeline using the -s option for stable sorting:

sort -t\| -rnk3 file.in | sort -st\| -k1

Upvotes: 2

Related Questions