Reputation: 515
I'm trying to load a csv file in PigLatin. Record format is as follows:
"ABBOTT,DEEDEE W",GRADES 9-12 TEACHER,"52,122.10",0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010
I tried the following code:
A = LOAD '/user/hduser/salaryTravel.csv' using PigStorage(',') AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);
But the output is as follows:
("ABBOTT,DEEDEE W",,,122.10",0,)
The name
field is read as separate fields since the name field contains a comma(','). How can I read this record?
Upvotes: 1
Views: 1189
Reputation: 2287
Would suggest to use CSVExcelStorage or CSVLoader API for loading the data.
REGISTER piggybank.jar;
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage() AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);
or
REGISTER piggybank.jar;
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader() AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);
Ref : REGEX_EXTRACT error in PIG, have shared few code samples.
Upvotes: 2