Reputation: 2861
In pig I have following structure:
(1, {(2), (2), (3), (12)})
and I want to transform it into:
(1, {(2,2), (3,1), (12,1)})
It's just a group by and count inside the bag: (group_key, count)
I've tried some group by nested inside foreach, but it doesn't work.
How could I do it with pig latin? Or I should write a UDF myself?
Thanks!
Upvotes: 0
Views: 401
Reputation: 5801
You can just FLATTEN
out the bag and then re-group. This might be wasteful if you have many many rows each with a small bag. In that case I would recommend a UDF. This should work for you (untested):
DUMP A;
(1, {(2), (2), (3), (12)})
DESCRIBE A;
(x:int, y:bag{})
B = FOREACH A GENERATE x, FLATTEN(y) AS z;
C = GROUP B BY (x, z);
D = FOREACH C GENERATE group.x, group.z, COUNT(B) AS ct;
E = GROUP D BY x;
F = FOREACH E GENERATE group, D.(z,ct);
F should be what you are looking for.
Upvotes: 1