Reputation: 3388
I have a very simple 2-column data, with a double and a chararray:
user1 234.43
user1 432.23
user2 4321.213
etc.
I want to group by users, then compute the average of their doubles. How? Do I need a "GROUP * ALL"? I'm trying to follow Example 2 http://wiki.apache.org/pig/PigOverview , but it's not working for me.
selfReportsAndDiscrepancies = FOREACH discrepancies1 GENERATE discrepancy,selfReportedText;
perDiscrepancy = GROUP selfReportsAndDiscrepancies BY selfReportedText;
allDiscrep = group perDiscrepancy all;
means = FOREACH allDiscrep GENERATE AVG(perDiscrepancy.discrepancy);
DUMP means;
DESCRIBE means;
gives me:
2013-04-02 17:54:06,611 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1128: Cannot find field discrepancy in group:chararray,selfReportsAndDiscrepancies:bag{:tuple(discrepancy:double,selfReportedText:chararray)}
Upvotes: 2
Views: 1215
Reputation: 3284
I hope I understood you correctly, you want the average of the group averages:
VISITS = LOAD 'data' USING PigStorage(' ') AS (user:chararray, number:double);
USER_VISITS = GROUP VISITS BY user;
USER_AVG = FOREACH USER_VISITS GENERATE group AS user, AVG(VISITS.number) AS average;
ALL_AVG = GROUP USER_AVG ALL;
OVERALL_AVG = FOREACH ALL_AVG GENERATE AVG(USER_AVG.average);
DUMP OVERALL_AVG;
Results in:
(2327.2715)
Upvotes: 2