Reputation: 207
I want to sort a huge file of approx 20M rows:
so I can get the highest scorers per team.
I want to be considerate of system's resources. So...
If so, can you please show how to do it?
My input-file will be about 20M rows in the following format
Chicago Bulls|Michael Jordan|38
LA Lakers|Kobe Bryant|32
Chicago Bulls|Steve Kerr|16
LA Lakers|Paul Gasol|20
LA Lakers|Shaquile ONeal|19
Chicago Bulls|Scottie Pippen|23
.
.
.
Upvotes: 3
Views: 202
Reputation: 4883
You don't need to sort.
#!/usr/bin/perl
use warnings; use strict;
my %high_score;
while (<DATA>) {
chomp;
my ($team_name, $player, $score) = split(/\|/);
for ($high_score{$team_name}{$player}) {
$_ = $score
unless $_ && $_ > $score
}
}
for my $team_name (sort keys %high_score) {
my %team_scores = %{ $high_score{$team_name} };
my @top_players = sort { $team_scores{$b} <=> $team_scores{$a} } (keys %team_scores);
my $n = 0;
for my $player (@top_players) {
print "$team_name, $player high score: $team_scores{$player}\n";
last if ++$n >= 2;
}
}
__DATA__
Chicago Bulls|Michael Jordan|38
Chicago Bulls|Scottie Pippen|23
Chicago Bulls|Poor Joe|10
Chicago Bulls|Steve Kerr|16
LA Lakers|Kobe Bryant|32
LA Lakers|Paul Gasol|20
LA Lakers|Shaquile ONeal|19
Edits: (1) updated requirements (2) s/while/for/
Upvotes: 3
Reputation: 36272
I don't know if sort
will break with such a big file, but you can try following command. It separates fields with pipe, then sort by first field and by third field numerically in reverse order, (-r
), descendant:
sort -t'|' -k1,1 -k3,3nr infile
It yields:
Chicago Bulls|Michael Jordan|38
Chicago Bulls|Scottie Pippen|23
Chicago Bulls|Steve Kerr|16
LA Lakers|Kobe Bryant|32
LA Lakers|Paul Gasol|20
LA Lakers|Shaquile ONeal|19
Upvotes: 2
Reputation: 56129
I don't think you can tell sort
to sort ascending in one column and descending in another. However, you can use two sorts
in a pipeline using the -s
option for stable sorting:
sort -t\| -rnk3 file.in | sort -st\| -k1
Upvotes: 2