Reputation: 72
I am trying to remove double quotes(") from file.Some of the field has data like "Newyork,NY". Please advice me what to do?I have tried to delete (") from CSV.But it is not happening.Stepwise Codes are given below:
I am opening pig using pig -x local
1st Step:
test4 = LOAD '/home/hduser/Desktop/flight_data.csv'
using PigStorage(',') AS (
YEAR: chararray,
QUARTER: chararray,
MONTH: chararray,
DAY_OF_MONTH: chararray,
DAY_OF_WEEK: chararray,
FL_DATE: chararray,
UNIQUE_CARRIER: chararray,
AIRLINE_ID: chararray,
CARRIER: chararray,
TAIL_NUM: chararray,
FL_NUM: chararray,
ORIGIN: chararray,
ORIGIN_CITY_NAME: chararray,
ORIGIN_STATE_ABR: chararray,
ORIGIN_STATE_FIPS: chararray,
ORIGIN_STATE_NM: chararray,
ORIGIN_WAC: chararray,
DEST: chararray,
DEST_CITY_NAME: chararray,
DEST_STATE_ABR: chararray,
DEST_STATE_FIPS: chararray,
DEST_STATE_NM: chararray,
DEST_WAC: chararray,
CRS_DEP_TIME: chararray,
DEP_TIME: chararray,
DEP_DELAY: chararray,
DEP_DELAY_NEW: chararray,
DEP_DEL15: chararray,
DEP_DELAY_GROUP: chararray,
DEP_TIME_BLK: chararray,
TAXI_OUT: chararray,
WHEELS_OFF: chararray,
WHEELS_ON: chararray,
TAXI_IN: chararray,
CRS_ARR_TIME: chararray,
ARR_TIME: chararray,
ARR_DELAY: chararray,
ARR_DELAY_NEW: chararray,
ARR_DEL15: chararray,
ARR_DELAY_GROUP: chararray,
ARR_TIME_BLK: chararray,
CANCELLED: chararray,
CANCELLATION_CODE: chararray,
DIVERTED: chararray,
CRS_ELAPSED_TIME: chararray,
ACTUAL_ELAPSED_TIME: chararray,
AIR_TIME: chararray,
FLIGHTS: chararray,
DISTANCE: chararray,
DISTANCE_GROUP: chararray,
CARRIER_DELAY: chararray,
WEATHER_DELAY: chararray,
NAS_DELAY: chararray,
SECURITY_DELAY: chararray,
LATE_AIRCRAFT_DELAY: chararray);
2nd Step:
new_data = foreach test4 generate
FLATTEN(REGEX_EXTRACT(ORIGIN_CITY_NAME,'."([^"])"',1)) AS StateName;
After I writing this command,in new_data variable fields are saving like (). Please suggest me some option to overcome this problem.Thanks in advance for your help.
I have tried with another way also,given below:
aviation_data = foreach test4 generate
REGEX_EXTRACT($0,'([0-9]+)', 1),
REGEX_EXTRACT($1,'([0-9]+)', 1),
REGEX_EXTRACT($2,'([0-9]+)', 1),
REGEX_EXTRACT($3,'([0-9]+)', 1),
REGEX_EXTRACT($4,'([0-9]+)', 1),
REGEX_EXTRACT($5,'([0-9]+)', 1),
REGEX_EXTRACT($6,'([0-9]+)', 1),
REGEX_EXTRACT($7,'([0-9]+)', 1),
REGEX_EXTRACT($8,'([0-9]+)', 1),
REGEX_EXTRACT($9,'([0-9]+)', 1),
REGEX_EXTRACT($10,'([0-9]+)', 1),
REGEX_EXTRACT($11,'([0-9]+)', 1),
REGEX_EXTRACT($12,'([0-9]+)', 1),
REGEX_EXTRACT($13,'([0-9]+)', 1),
REGEX_EXTRACT($14,'([0-9]+)', 1),
REGEX_EXTRACT($15,'([0-9]+)', 1),
REGEX_EXTRACT($16,'([0-9]+)', 1),
REGEX_EXTRACT($17,'([0-9]+)', 1),
REGEX_EXTRACT($18,'([0-9]+)', 1),
REGEX_EXTRACT($19,'([0-9]+)', 1),
REGEX_EXTRACT($20,'([0-9]+)', 1),
REGEX_EXTRACT($21,'([0-9]+)', 1),
REGEX_EXTRACT($22,'([0-9]+)', 1),
REGEX_EXTRACT($23,'([0-9]+)', 1),
REGEX_EXTRACT($24,'([0-9]+)', 1),
REGEX_EXTRACT($25,'([0-9]+)', 1),
REGEX_EXTRACT($26,'([0-9]+)', 1),
REGEX_EXTRACT($27,'([0-9]+)', 1),
REGEX_EXTRACT($28,'([0-9]+)', 1),
REGEX_EXTRACT($29,'([0-9]+)', 1),
REGEX_EXTRACT($30,'([0-9]+)', 1),
REGEX_EXTRACT($31,'([0-9]+)', 1),
REGEX_EXTRACT($32,'([0-9]+)', 1),
REGEX_EXTRACT($33,'([0-9]+)', 1),
REGEX_EXTRACT($34,'([0-9]+)', 1),
REGEX_EXTRACT($35,'([0-9]+)', 1),
REGEX_EXTRACT($36,'([0-9]+)', 1),
REGEX_EXTRACT($37,'([0-9]+)', 1),
REGEX_EXTRACT($38,'([0-9]+)', 1),
REGEX_EXTRACT($39,'([0-9]+)', 1),
REGEX_EXTRACT($40,'([0-9]+)', 1),
REGEX_EXTRACT($41,'([0-9]+)', 1),
REGEX_EXTRACT($42,'([0-9]+)', 1),
REGEX_EXTRACT($43,'([0-9]+)', 1),
REGEX_EXTRACT($44,'([0-9]+)', 1),
REGEX_EXTRACT($45,'([0-9]+)', 1),
REGEX_EXTRACT($46,'([0-9]+)', 1),
REGEX_EXTRACT($47,'([0-9]+)', 1),
REGEX_EXTRACT($48,'([0-9]+)', 1),
REGEX_EXTRACT($49,'([0-9]+)', 1),
REGEX_EXTRACT($50,'([0-9]+)', 1),
REGEX_EXTRACT($51,'([0-9]+)', 1),
REGEX_EXTRACT($52,'([0-9]+)', 1),
REGEX_EXTRACT($53,'([0-9]+)', 1),
REGEX_EXTRACT($54,'([0-9]+)', 1);
results are given below:
(2015,1,1,29,4,2015,,20304,,549,4837,,,,,04,,81,,,,,53,,93,1757,1851,54,54,1,3,1700,19,1910,2034,6,2005,2040,35,35,1,2,2000,0,,0,188,169,144,1,1107,5,0,0,0)
none of the text field is coming.
Upvotes: 1
Views: 1206
Reputation: 2287
We can use either : org.apache.pig.piggybank.storage.CSVExcelStorage() or org.apache.pig.piggybank.storage.CSVLoader().
Refer the below API links for details
http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/CSVLoader.html
test4 = LOAD '/home/hduser/Desktop/flight_data.csv'
USING org.apache.pig.piggybank.storage.CSVExcelStorage() AS (....)
Upvotes: 1