Reputation: 25
Any help in this would be greatly appreciated! Best way is with an example:
Input:
Schema:
Name|phone_type|phone_num
Example data:
Kyle|Cell|555-222-3333
Kyle|Home|453-444-5555
Tom|Home|555-555-5555
Tom|Pager|555-555-4344
Desired output:
Schema:
Name|Home_num|Cell_num|Pager_num
Example:
Kyle|453-444-5555|555-222-3333|null
Tom|555-555-5555|null|555-555-4344
Code:
data=Load 'test.txt' using PigStorage('|');
grpd= Group data by $0;
Foreach grpd{
???
}
Upvotes: 0
Views: 1235
Reputation: 1691
After the comment of @Murali lao, I rewrite the solution.
I now use FILTER, and then the trick to not filter empty bag with FLATTEN is to add an empty string when the bag is empty.
Here are my test data:
tom,home,555
tom,pager,666
tom,cell,777
bob,home,111
bob,cell,222
Here is my solution:
data = LOAD 'phone' USING PigStorage(',') AS (name:chararray, phone_type: chararray, phone_num: chararray);
user = FOREACH (GROUP data BY name) {
home = FILTER $1 BY phone_type == 'home';
-- you add an empty string if the the bag is empty
homenum = (IsEmpty(home) ? {('')} : home.phone_num);
pager = FILTER $1 BY phone_type == 'pager';
pagernum = (IsEmpty(pager) ? {('')} : pager.phone_num);
cell = FILTER $1 BY phone_type == 'cell';
cellnum = (IsEmpty(cell) ? {('')} : cell.phone_num);
GENERATE group as name, FLATTEN(homenum) as home, FLATTEN(pagernum) as pager, FLATTEN(cellnum) as cell;
};
After a dump, I obtain the following result :
(bob,111,,222)
(tom,555,666,777)
Upvotes: 2