Reputation: 1
Hi I am trying to load the following data (inculdes different delimiters and is unstructered) into Pig using PigLatin only, without preparing the data with i.e. Java.
Input:
1234 #one,#two,#three
5679 #one,#two
1234 #one
Output what I am looking for:
1234 #one
1234 #two
1234 #three
5678 #one
5678 #two
1234 #one
Any ideas? Is this even possible in Pig? Thanks a lot in advance!
Upvotes: 0
Views: 297
Reputation: 2287
Pig Script :
A = LOAD 'a.csv' AS USING PigStorage(' ') (key:chararray, value:chararray);
B = FOREACH A GENERATE key, FLATTEN(TOKENIZE(value, ','));
DUMP B;
Input : a.csv :
1234 #one,#two,#three
5679 #one,#two
1234 #one
Output : DUMP B:
(1234,#one)
(1234,#two)
(1234,#three)
(5679,#one)
(5679,#two)
(1234,#one)
Upvotes: 1