Ravinder Karra
Ravinder Karra

Reputation: 307

How to ignore "," in data fields

I am trying to generate following ... Input 396124436476092416,"Think about the life you livin but don't think so hard it hurts Life is truly a gift, but at the same it is a curse",Obey_Jony09 396124440112951296,"00:00 #MAW",WesleyBitton

A = LOAD '/user/root/data/tweets.csv' USING PigStorage(',') as (users:chararray, tweets:chararray);
B = FILTER A by users == '396124436476092416';

output truncated (396124436476092416,"Think about the life you livin but don't think so hard it hurts Life is truly a gift)

Output excepting (396124436476092416,"Think about the life you livin but don't think so hard it hurts Life is truly a gift, but at the same it is a curse")

I do not want to read row as line.

Upvotes: 0

Views: 170

Answers (2)

Rijul
Rijul

Reputation: 1445

You can use CSVLoader for loading data

however if you do not wish to do that here is the work around in Apache Pig itself for that :

--Load your Data

A  = LOAD 'your/path/users.csv' USING TextLoader() AS (unparsed:chararray);

--Replace your " string with | so as to separate your tweets

B = FOREACH A GENERATE REPLACE(unparsed, '\\"', '|') AS parsed:chararray;

--store your temporary parsed data into your location

STORE B INTO 'your/path/parsed_users.csv' USING PigStorage('|');

--load your parsed data

C = LOAD 'your/path/parsed_users.csv' USING PigStorage('|') AS (users:chararray, tweets:chararray);

--Dump your data , how ever this will still contain one extra comma(,) but you can replace it by using the replace function you get the point.

DUMP C;

Upvotes: 1

54l3d
54l3d

Reputation: 3973

Thats fit in the csv standardization, so you need just to use CSVLoader which

supports double-quoted fields that contain commas and other double-quotes escaped with backslashes.

This is how to use it :

register file:/home/hadoop/lib/pig/piggybank.jar
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
A = LOAD '/user/root/data/tweets.csv' USING CSVLoader AS (users:chararray, tweets:chararray); 
B = FILTER A by users == '396124436476092416';

Upvotes: 0

Related Questions