darkjh
darkjh

Reputation: 2861

pig: group within a bag for every row

In pig I have following structure:

(1, {(2), (2), (3), (12)})

and I want to transform it into:

(1, {(2,2), (3,1), (12,1)})

It's just a group by and count inside the bag: (group_key, count)

I've tried some group by nested inside foreach, but it doesn't work.

How could I do it with pig latin? Or I should write a UDF myself?

Thanks!

Upvotes: 0

Views: 401

Answers (1)

reo katoa
reo katoa

Reputation: 5801

You can just FLATTEN out the bag and then re-group. This might be wasteful if you have many many rows each with a small bag. In that case I would recommend a UDF. This should work for you (untested):

DUMP A;
(1, {(2), (2), (3), (12)})
DESCRIBE A;
(x:int, y:bag{})

B = FOREACH A GENERATE x, FLATTEN(y) AS z;
C = GROUP B BY (x, z);
D = FOREACH C GENERATE group.x, group.z, COUNT(B) AS ct;
E = GROUP D BY x;
F = FOREACH E GENERATE group, D.(z,ct);

F should be what you are looking for.

Upvotes: 1

Related Questions