Reputation: 49
This below program I am trying to do it in Apache Pig as it is and unstructured data
i) I have dataset which contains street name, city and state:
ii) Group by state
iii) I am taking COUNT(*) of states in the dataset Now my o/p will be like statename,count===>how may time that state is available in the dataset
program:
realestate = LOAD DATA using pigstorage(',') as (street:string,city string,state string);
A = GROUP realestate by state;
B= FOREACH A GENERATE group , count (*)
O/P will be like
CA,14
washington,20
now I need max of (count) my output should be " washington,20)
how to proceed it .please help me to resolve the issue
Upvotes: 1
Views: 391
Reputation: 18300
Apply ORDER
and LIMIT
on the generated result
realestate = LOAD DATA using pigstorage(',') as (street:string,city string,state string);
A = GROUP realestate by state;
B = FOREACH A GENERATE group , COUNT(realestate) as c;
# Arrange the tuples based on the count in descending order
D = order B by c desc;
# Apply limit on the ordered result to get the Max value
E = LIMIT D 1;
Upvotes: 1