LordBendtner
LordBendtner

Reputation: 21

apache pig load data with multiple delimiters

Hi everyone I have a problem about loading data using apache pig, the file format is like:

"1","2","xx,yy","a,sd","3"

So I want to load it by using the multiple delimiter "," 2double quotes and one comma like:

A = LOAD 'file.csv' USING PigStorage('","') AS (f1,f2,f3,f4,f5);

but the PigStorage doesn't accept the multiple delimiter ",".How I can do it? Thank you very much!

Upvotes: 2

Views: 2292

Answers (1)

nobody
nobody

Reputation: 11080

PigStorage takes single character as delimiter.You will have use builtin functions from PiggyBank. Download piggybank.jar and save in the same folder as your pigscript.Register the jar in your pigscript.

REGISTER piggybank.jar;

DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();

A = LOAD 'test1.txt' USING CSVLoader(',') AS (f1:int,f2:int,f3:chararray,f4:chararray,f5:int);
B = FOREACH A GENERATE f1,f2,f3,f4,f5;
DUMP B;

Alternate option is to load the data into a line and then use STRSPLIT

A = LOAD 'test1.txt' USING TextLoader() AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(line, '","'));
DUMP B;

Upvotes: 2

Related Questions