dedpo
dedpo

Reputation: 502

PIG - how to group by field, which has multiple entries

I want to be able to group by hours here, i know i will have multiple entries of hours filed. For example 11th hour like below will appear multiple times. How do i do this?

hour,windSpeed
11, 3.6
2 , 6.8
11, 2.5
13, 5.0
14, 8.9
11, 3.2

So i have this and i only want to group by hour

so for example we'd like {11: 3.6, 2.5, 3.2 }

and remanings since only one value will group to it's own

{14: 8.9}

{2: 6.8}

answer = FOREACH weather_data GENERATE $0 AS hour, $1 as speed

Upvotes: 0

Views: 86

Answers (2)

SunSmiles
SunSmiles

Reputation: 186

Try this.

A = LOAD 'data' AS (Hour:chararray, windSpeed:chararray);
B = GROUP A BY (Hour);
C = FOREACH B GENERATE
FLATTEN(group) AS (Hour), A.windSpeed
;

Note: This is an untested code

Upvotes: 1

nobody
nobody

Reputation: 11080

Group by hour

A = FOREACH weather_data GENERATE $0 AS hour, $1 as speed;
B = GROUP A by hour;
DUMP B;

If you want to aggregate then use sum

C = FOREACH B generate group as hour,SUM(A.speed) as Total;
DUMP C;

Upvotes: 1

Related Questions