user1855165
user1855165

Reputation: 289

hadoop pig return top 5 rows

I want to return the top 5 rows of a group. Basically I have a table with some state names and their cities which is grouped by state name. I want to have the top 5 cities of that state and not all of them.

How can I do this using pig? Thank you in advance.

Upvotes: 4

Views: 19761

Answers (1)

Donald Miner
Donald Miner

Reputation: 39893

After a GROUP BY, inside of a FOREACH... you can do an ORDER BY first, then LIMIT. This will sort the things in each group first by city size, then pulls the top 5.

B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY citysize DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(group), FLATTEN(DB.citysize), FLATTEN(DB.cityname);
}

Upvotes: 11

Related Questions