user1675314
user1675314

Reputation:

Apache Pig Student Marks Average Calculation

I have a dataset in the format,

student_id|name|subject|marks

2          John English   50

3          mark Maths     50

3          mark English   50

This data is loaded into HDFS, I need to calculate the average of all subjects for each student using pig, what would be the pig methodology to do this.

Upvotes: 0

Views: 228

Answers (1)

nobody
nobody

Reputation: 11080

Group by student and get the average.Assuming you have loaded the data to relation A.

B = GROUP A BY (student);
C = FOREACH B GENERATE group,AVG(A.marks);
DUMP C;

Upvotes: 1

Related Questions