Reputation: 1490
Is there a good way to do ranking on a column in Apache Pig after you've sorted it? Even better would be if the ranking handled ties.
A = LOAD 'file.txt' as (score:int, name:chararray);
B = foreach A generate score, name order by score;
....
Upvotes: 4
Views: 2434
Reputation: 1409
You can use Rank in PIG and it will handle also ties, but it will use only one reducer while applying rank ,so performance will impact.
Upvotes: 0
Reputation: 1702
You should use a mix of both the solutions
B = ORDER A BY score DESC;
C = rank B;
Lets say you want the second largest
D = filter C by $0 == 2;
Upvotes: 0
Reputation: 1702
Try the Rank operation
A = load 'data' AS (f1:chararray,f2:int,f3:chararray);
DUMP A;
(David,1,N)
(Tete,2,N)
B = rank A;
dump B;
(1,David,1,N)
(2,Tete,2,N)
Reference https://blogs.apache.org/pig/entry/apache_pig_it_goes_to
Upvotes: 2