Winter
Winter

Reputation: 1490

ranking in Apache Pig

Is there a good way to do ranking on a column in Apache Pig after you've sorted it? Even better would be if the ranking handled ties.

A = LOAD 'file.txt' as (score:int, name:chararray);
B = foreach A generate score, name order by score;
....

Upvotes: 4

Views: 2434

Answers (4)

Narendra Parmar
Narendra Parmar

Reputation: 1409

You can use Rank in PIG and it will handle also ties, but it will use only one reducer while applying rank ,so performance will impact.

Upvotes: 0

Krishna Kalyan
Krishna Kalyan

Reputation: 1702

You should use a mix of both the solutions

B = ORDER A BY score DESC;
C = rank B;

Lets say you want the second largest

D = filter C by $0 == 2;

Upvotes: 0

Krishna Kalyan
Krishna Kalyan

Reputation: 1702

Try the Rank operation

A = load 'data' AS (f1:chararray,f2:int,f3:chararray);

DUMP A;
(David,1,N)
(Tete,2,N)
B = rank A;

dump B;
(1,David,1,N)
(2,Tete,2,N)

Reference https://blogs.apache.org/pig/entry/apache_pig_it_goes_to

Upvotes: 2

sheimi
sheimi

Reputation: 48

I think you could use "ORDER BY" operator. And here is the link

B = ORDER A BY score DESC;

or

B = ORDER A BY score ASC;

Upvotes: 0

Related Questions