Reputation: 47861
I'd like to create a flattened join table from the following schema
titles = FOREACH programs GENERATE (px.pig.udf.PARSE_KEYWORDS(program_xml))
AS program:
(root_id: long,
ids:bag {(idtype:chararray, idvalue:chararray)},
keywords:bag {(keytype:chararray,keyvalue:chararray)});
if the input is
(1, {('x','foo'),('y','bar')},{})
(2, {('x','fiz'),('y','buzz')},{})
(2, {('x','moo')},{})
...
The output should be something like:
root_id idvalue
1 foo
1 bar
2 fiz
2 buzz
3 moo
How would I do that in pig?
Upvotes: 0
Views: 316
Reputation: 131
Project first two columns.
x = foreach titles generate root_id, ids;
flatten on the second column.
y = foreach x generate root_id, FLATTEN(ids) as (idtype:chararray, idvalue:chararray);
This will give you the result in the following format:
root_id idtype idvalue
1 x foo
1 y bar
2 x fiz
2 y buzz
3 x moo
Project first and third column to get the required result.
Upvotes: 2