jonas
jonas

Reputation: 15

change the format of a text file with apache pig

I have a txt file which has the following format:

{ (word1),(word2),(word3),....,(wordn) }

The words are NOT in quotes. I would like to use apache pig and change the format of this file simply to:

word1
word2
word3
wordn    

Is there any way to do so with apache pig?

Upvotes: 1

Views: 259

Answers (1)

Sivasakthi Jayaraman
Sivasakthi Jayaraman

Reputation: 4724

Can you try this?

input

{ (word1),(word2),(word3),(wordn) }

PigScript1:

A = LOAD 'input' AS (mybag:{T:(line:chararray)});
B = FOREACH A GENERATE REPLACE(BagToString(mybag.line),'_',' ');
STORE B INTO 'output';

Output:(stored in output/part* file)

word1 word2 word3 wordn

Update:(Incase if you want all the columns in a single row then use Flatten operator)
PigScript2:

A = LOAD 'input' AS (mybag:{T:(line:chararray)});
B = FOREACH A GENERATE FLATTEN(mybag);
STORE B INTO 'output1';

Output:

word1
word2
word3
wordn

Upvotes: 0

Related Questions