Reputation:
I have a dataset in the format,
student_id|name|subject|marks
2 John English 50
3 mark Maths 50
3 mark English 50
This data is loaded into HDFS, I need to calculate the average of all subjects for each student using pig, what would be the pig methodology to do this.
Upvotes: 0
Views: 228
Reputation: 11080
Group by student and get the average.Assuming you have loaded the data to relation A.
B = GROUP A BY (student);
C = FOREACH B GENERATE group,AVG(A.marks);
DUMP C;
Upvotes: 1