Reputation: 8067
I have a MAP
of data in a txt file:
[age#27,height#5.8]
[age#25,height#5.3]
[age#27,height#5.10]
[age#25,height#5.1]
I want to display the average height for each group of age.
This is the LAOD
statement:
records = LOAD '~/Documents/Pig_Map.txt' AS (details:map[]);
records: {details: map[]}
Then I grouped the data based on age:
group_data = GROUP records BY details#'age';
group_data: {group: bytearray,records: {(details: map[])}}
for accessing the details
I did a FLATTEN
like this (NOT SURE IF I NEED THIS STEP):
flatten_records = FOREACH group_data GENERATE group,FLATTEN(records);
flatten_records: {group: bytearray,records::details: map[]}
DUMP flatten_records
this give me the below output:
(25,[height#5.1,age#25])
(25,[height#5.3,age#25])
(27,[height#5.10,age#27])
(27,[height#5.8,age#27])
Now I want to get the average height; I tried this:
display_records = FOREACH flatten_records GENERATE group,AVG(records.details#'height');
The error is:
<line 10, column 57> Multiple matching functions for org.apache.pig.builtin.AVG with input schema: ({{(bytearray)}}, {{(double)}}). Please use an explicit cast.
Please advice.
Upvotes: 0
Views: 321
Reputation: 4724
Can you try this?
records = LOAD '~/Documents/Pig_Map.txt' AS (details:map[]);
records1 = FOREACH records GENERATE details#'age' AS age,details#'height' AS height;
group_data = GROUP records1 BY age;
display_records = FOREACH group_data GENERATE group,AVG(records1.height);
dump display_records;
Output:
(25,5.199999999999999)
(27,5.449999999999999)
Upvotes: 2