Reputation: 531
I am loading a file in pig having delimiter as '^A^E^A'
I tried below command however it is not working.
data = LOAD 'test.txt' USING PigStorage('\u0001\u0005\u0001') AS (user, time, query);
Did i miss anything? or Is there any way to specify the above delimiter directly using PigStorage? how?
Thanks.
Upvotes: 1
Views: 419
Reputation: 1
File_Data = LOAD 'thedata.csv' USING TextLoader(); Cleansing_Data = FOREACH File_Data GENERATE REPLACE($0,'\u0001|\u0005\|\u0001',''); STORE Cleansing_Data INTO 'tmp/Cleansing_Data.txt' USING PigStorage(); Final_Data = LOAD 'tmp/Cleansing_Data.txt' USING PigStorage(',') AS (user, time, query);
Upvotes: 0
Reputation: 11080
Load the data as line:chararray
Replace '\u0001\u0005\u0001' with a '|' or ','
Split the resulting line using the '|' or ',' to generate the required columns.
data = LOAD 'test.txt' as (line:chararray);
clean_data = foreach data generate REPLACE(line,'\\u0001\\u0005\\u0001','|');
new_data = foreach clean_data generate SPLIT(clean_data.$0,'|');
Upvotes: 2
Reputation: 2089
I believe PigStorage will not support more than one ctrl delimiter , you may have to write UDF to achieve this.
Upvotes: 0