sivaraj
sivaraj

Reputation: 49

MAX(Count) function apache pig latin

This below program I am trying to do it in Apache Pig as it is and unstructured data

i) I have dataset which contains street name, city and state:

ii) Group by state

iii) I am taking COUNT(*) of states in the dataset Now my o/p will be like statename,count===>how may time that state is available in the dataset

program:

realestate = LOAD DATA using pigstorage(',') as (street:string,city string,state string);

A = GROUP realestate by state;
B= FOREACH A GENERATE group , count (*)

O/P will be like

CA,14
washington,20

now I need max of (count) my output should be " washington,20)

how to proceed it .please help me to resolve the issue

Upvotes: 1

Views: 391

Answers (1)

franklinsijo
franklinsijo

Reputation: 18300

Apply ORDER and LIMIT on the generated result

realestate = LOAD DATA using pigstorage(',') as (street:string,city string,state string);
A = GROUP realestate by state;
B = FOREACH A GENERATE group , COUNT(realestate) as c;

# Arrange the tuples based on the count in descending order
D = order B by c desc;

# Apply limit on the ordered result to get the Max value
E = LIMIT D 1;

Upvotes: 1

Related Questions