Reputation: 11337
I've a file with three columns representing a date and a min/max temperature value.
01012010 4.5 15.9
I need to calculate, for each day, the average. This seems pretty easy to do with a UDF but I would like to know if there's a way to do it without it.
I managed to achieve something like this (concatenating the temperatures and then flattening them) but it seems really too complex to me:
table = LOAD 'e7/temp.csv' USING PigStorage('\t') as (day:chararray, min:float, max:float);
day_group = FOREACH table GENERATE day, FLATTEN(TOKENIZE( CONCAT(CONCAT( (chararray)min, ','), (chararray)max) )) as minMax;
day_group_cast = FOREACH day_group GENERATE day, (float) minMax as minMax;
day_mean_group = GROUP day_group_cast BY day;
day_mean = FOREACH day_mean_group GENERATE group as day, AVG(day_group_cast.minMax) as minMax;
Upvotes: 0
Views: 856
Reputation: 2287
As per @Enrichman comments, below snippet would be suffice to achieve the objective.
temp_data = LOAD 'temp.csv' USING PigStorage(',') AS (day:chararray, min:float, max:float);
req_stats = FOREACH temp_data GENERATE day, (min+max)/2 AS avg_temp;
Upvotes: 1