Dadapeer Dudekula
Dadapeer Dudekula

Reputation: 23

How do I find the top two ratings in pig?

I have the data shown below:

USA,10  
UK,8  
INDIA,8  
PAKISTAN,5  
U.A.E,3  
GERMANY,3  
SWEDEN,2

How do I get the top two highest-rating countries? With the above sample data, I would want this:

UK,8  
INDIA,8 

Upvotes: 1

Views: 1036

Answers (1)

Sivasakthi Jayaraman
Sivasakthi Jayaraman

Reputation: 4724

Can you try this?

UPDATE:
If you don't have RANK operator in your pig version then its very difficult to solve this problem using native pig. One option could be download pig-0.11.1.jar and set it in your class path and try the below approach.

input.txt

USA,10
UK,8
INDIA,8
PAKISTAN,5
U.A.E,3
GERMANY,3
SWEDEN,2

PigScript:

DEFINE MyOver org.apache.pig.piggybank.evaluation.Over('myrank:int');
DEFINE MyStitch org.apache.pig.piggybank.evaluation.Stitch;

A = LOAD 'input.txt' USING PigStorage(',') AS (country:chararray,rating:int);
B = GROUP A ALL;
C = FOREACH B  {
                 mysort = ORDER A BY rating DESC;
                 GENERATE FLATTEN(MyStitch(mysort,MyOver(mysort,'dense_rank',0,1,1)));
                }
D = FILTER C BY stitched::myrank==2;
E = FOREACH D GENERATE stitched::country AS country,stitched::rating AS rating;
DUMP E;

Output:

(UK,8)
(INDIA,8)

Pig Version >11 support RANK operator

A = LOAD 'input.txt' USING PigStorage(',') AS (country:chararray,rating:int);
B = RANK A BY rating DESC;
C = FILTER B BY rank_A==2;
D = FOREACH C GENERATE country,rating;
DUMP D;

Output:

(UK,8)
(INDIA,8)

Upvotes: 1

Related Questions