Reputation: 97
I have two data sets, EmployeeDetail that contains 4 columns(id, name, gender, location) and SalaryDetail(id, salary). I joined both dats sets and grouped them as location.
EmpDetail = load '/Users/bmohanty6/EmployeeDetails/EmpDetail.txt' as (id:int, name:chararray, gender:chararray, location:chararray);
SalaryDetail = load '/Users/bmohanty6/EmployeeDetails/EmpSalary.txt' as (id:int, salary:float);
JoinedEmpDetail = join EmpDetail by id, SalaryDetail by id;
GroupedByLocation = group JoinedEmpDetail by location;
DUMP GroupedByLocation gives me the correct result I expect. Now when I try to take average using below line,
AverageSalary = foreach GroupedByLocation generate group, AVG(SalaryDetail.salary);
It throws below error.
<line 11, column 58> Could not infer the matching function for org.apache.pig.builtin.AVG as multiple or none of them fit. Please use an explicit cast.
I also tried in following way. But got same error.
AverageSalary = foreach GroupedByLocation {
Sum = SUM(SalaryDetail.salary);
Count = COUNT(SalaryDetail.salary);
avgSal = Sum/Count;
generate group as location, avgSal;
};
This time error was:
Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
Can anyone please suggest me a correct way of doing it.
Thanks Sivasakthi Jayaraman for answering my question.
AverageSalary = foreach GroupedByLocation generate group, AVG(JoinedEmpDetail.SalaryDetail::salary);
This gives me the average salary per each location.
Now I tried to find out average salary per gender in each location
. So I tried to group by gender
inside GroupedByLocation
variable. But facing some issues in it.
GroupdByGender = foreach GroupedByLocation {
genderGrp = group JoinedEmpDetail by JoinedEmpDetail.EmpDetail::gender;
avgSalary = foreach genderGrp generate group, AVG(JoinedEmpDetail.SalaryDetail::salary);
generate group as location, JoinedEmpDetail.EmpDetail::gender, avgSalary;
};
I got this error
Syntax error, unexpected symbol at or near 'JoinedEmpDetail'
Can anyone please help.
Upvotes: 0
Views: 791
Reputation: 4724
You can't access salary
column like this, first you need to project the JoinedEmpDetail
relation and then access the salary
column.
Can you try the below stmt?
AverageSalary = foreach GroupedByLocation generate group, AVG(JoinedEmpDetail.SalaryDetail::salary);
Upvotes: 1