Reputation: 997
I'm trying to write a script in PIG, and what I need to do is to take a data set - that contains user id, date, country code, and other attributes... my wanted result is to group by user id and date, and for each group like that - to concatenate the country codes in the same field...
For example:
user_id | date | country_code
1 2017-01-01 US
1 2017-01-01 UK
1 2017-01-02 FR
2 2017-01-02 RU
2 2017-01-03 DE
2 2017-01-03 AU
My wanted output:
(1, 2017-01-01, "US,UK")
(1, 2017-01-02, FR)
(2, 2017-01-02, RU)
(2, 2017-01-03, "DE,AU")
Upvotes: 2
Views: 516
Reputation: 21563
A question with very different wording actually yielded this answer by @Hari Shankar, as the question does not appear to be a duplicate I will post the answer here directly:
grouped = GROUP table BY userid; X = FOREACH grouped GENERATE group as userid, table.clickcount as clicksbag, table.pagenumber as pagenumberbag;
Now
X
will be:{(155,{(2),(3),(1)},{(12),(133),(144)}, (156,{(6),(7)},{(1),(5)}}
Now you need to use the builtin UDF BagToTuple:
output = FOREACH X GENERATE userid, BagToTuple(clickbag) as clickcounts, BagToTuple(pagenumberbag) as pagenumbers;
output
should now contain what you want. You can merge the output step into the merge step as well:output = FOREACH grouped GENERATE group as userid, BagToTuple(table.clickcount) as clickcounts, BagToTuple(table.pagenumber) as pagenumbers;
1: http://pig.apache.org/docs/r0.11.1/api/org/apache/pig/builtin/BagToTuple.html
Upvotes: 2