Enrichman
Enrichman

Reputation: 11337

Calculate average between two columns using Pig

I've a file with three columns representing a date and a min/max temperature value.

01012010    4.5    15.9

I need to calculate, for each day, the average. This seems pretty easy to do with a UDF but I would like to know if there's a way to do it without it.

I managed to achieve something like this (concatenating the temperatures and then flattening them) but it seems really too complex to me:

table          = LOAD 'e7/temp.csv' USING PigStorage('\t') as (day:chararray, min:float, max:float);
day_group      = FOREACH table GENERATE day, FLATTEN(TOKENIZE( CONCAT(CONCAT( (chararray)min, ','), (chararray)max) )) as minMax;
day_group_cast = FOREACH day_group GENERATE day, (float) minMax as minMax;
day_mean_group = GROUP day_group_cast BY day;
day_mean       = FOREACH day_mean_group GENERATE group as day, AVG(day_group_cast.minMax) as minMax;

Upvotes: 0

Views: 856

Answers (1)

Murali Rao
Murali Rao

Reputation: 2287

As per @Enrichman comments, below snippet would be suffice to achieve the objective.

temp_data =  LOAD 'temp.csv' USING PigStorage(',') AS (day:chararray, min:float, max:float);
req_stats = FOREACH temp_data GENERATE day, (min+max)/2 AS avg_temp;

Upvotes: 1

Related Questions