Ka L
Ka L

Reputation: 1

Loading unstructered data with different delimiters in Pig using PigLatin only

Hi I am trying to load the following data (inculdes different delimiters and is unstructered) into Pig using PigLatin only, without preparing the data with i.e. Java.

Input:

1234 #one,#two,#three
5679 #one,#two
1234 #one

Output what I am looking for:

1234 #one
1234 #two
1234 #three
5678 #one
5678 #two
1234 #one

Any ideas? Is this even possible in Pig? Thanks a lot in advance!

Upvotes: 0

Views: 297

Answers (1)

Murali Rao
Murali Rao

Reputation: 2287

Pig Script :

A = LOAD 'a.csv' AS USING PigStorage(' ') (key:chararray, value:chararray);
B = FOREACH A GENERATE key, FLATTEN(TOKENIZE(value, ','));
DUMP B;

Input : a.csv :

1234 #one,#two,#three
5679 #one,#two
1234 #one

Output : DUMP B:

(1234,#one)
(1234,#two)
(1234,#three)
(5679,#one)
(5679,#two)
(1234,#one)

Upvotes: 1

Related Questions