venkat87
venkat87

Reputation: 67

How to find max of a tuple in pig latin

i ended up with the below data using pig latin.
(all,{((1,a),(3,b),(2,d))}) now i need to find the max of the first digit(1/2/3) then display that whole tuple. I expect the output as 3,b.
please help me how to write this in pig latin.

Upvotes: 1

Views: 748

Answers (3)

KP Fingh
KP Fingh

Reputation: 79

Let us load and group the relation named emp and then find that emp which has maximum salary.

emp = load '' USING PigStorage() AS (employee_id:int, employee_name:chararray, job:chararray, mgr:int, hiredate:datetime, salary:int, commission:int, department_no:int);

emp_salary_id_mgr = foreach emp generate salary, employee_name;

emp_grouped = group emp_salary_id_mgr all;

(all,{(5000,KING),(2850,BLAKE),(2450,CLARK),(2975,JONES),(3000,SCOTT),(3000,FORD),(800,SMITH),(1600,ALLEN),(1250,WARD),(1250,MARTIN),(1500,TURNER),(1100,ADAMS),(950,JAMES),(1300,MILLER),(10000,KPIYA),(1300,KILLER)})

Let us describe the grouped relation

emp_grouped: {group: chararray,emp_salary_id_mgr: {(salary: int,employee_name: chararray)}}

Let us find max salary

sal_max = foreach emp_grouped generate MAX(emp_salary_id_mgr.salary) AS max_salary_val;

(10000)

But we need to find entire row (10000,KPIYA) so let us perform join

emp_grouped_join_sal_max = JOIN emp_grouped by emp_salary_id_mgr.salary , sal_max BY max_salary_val;

--Error message 'Cannot merge join keys, incompatible types'

Now we need to flatten bag so that we can perform JOIN

emp_grouped_flattened = foreach emp_grouped GENERATE FLATTEN($1);

emp_joined_maxsal = JOIN emp_grouped_flattened by emp_salary_id_mgr::salary, sal_max by max_salary_val;

emp_output = foreach emp_joined_maxsal generate TOTUPLE(emp_grouped_flattened::emp_salary_id_mgr::salary,emp_grouped_flattened::emp_salary_id_mgr::employee_name);

The output comes as below which is same as desired
((10000,KPIYA))

Upvotes: 1

venkat87
venkat87

Reputation: 67

flattensum1 = FOREACH sumcolors1 GENERATE          
FLATTEN(TOBAG(TOTUPLE($0,$1),TOTUPLE($2,$3),TOTUPLE($4,$5));

orderflattensum1 = ORDER flattensum1 by $0 desc;

limitorderflattensum1 = LIMIT orderflattensum1 1;

I went a step back with my data. i took this one as input (1,a),(3,b),(2,d) instead of grouping them, i used the above statements to make them as individual rows and then ordering it by desc. then limited the rows to 1.

anyway thanks for your responses. i am going to try your responses as well.

Upvotes: 0

Ankur Alankar Biswal
Ankur Alankar Biswal

Reputation: 1182

Steps to get max:

A = load 'pdemo/sample' using Pigstorage(',')  as(id:int,name:chararray); 
grp = group A all;   
res = foreach grp generate MAX(A.id) as max;
ij = join A by id,res by max;  
fresult = foreach ij generate A::id,A::name;
dump fresult 

Hope this will help..

Upvotes: 0

Related Questions