Reputation: 23
Hi i am working in big data ,since i am a new bee to pig programming help me to get the required output.I have a csv file which have many columns,one of the column is price,which has data like the following:
(10 Lacs)
(20 to 30 Lacs)
And i need this to be splitted as
price min max
10 null null
null 20 30
I have tried the following code
a = LOAD '/user/folder1/filename.csv' using PigStorage(',')as(SourceWebsite:chararray,PropertyType:chararray,PropertyId:chararray,title:chararray,bedroom:int,bathroom:int,Balconies:chararray,price:chararray,pricepersqft:chararray,builtuparea:chararray,address:chararray,otherdetails:chararray,description:chararray,posted:chararray,Features:chararray,ContactDetails:chararray);
b = FOREACH a GENERATE STRSPLIT(price, 'to');
c = FOREACH b GENERATE FLATTEN(STRSPLIT(Price,',')) AS (MAX:int,MIN:int);
dump c;
Any help will be appreciated.
Upvotes: 1
Views: 3175
Reputation: 21563
I just ran into the same issue, and here is how I managed to solve it.
Suppose the column called outputraw.outputlineraw
looks like this:
abc|def
gh|j
Then I split it into multiple columns like so:
output_in_columns = FOREACH output_raw GENERATE
FLATTEN(STRSPLIT(output_line_raw,'\\|'));
To test whether it succeeded, I dumped the result after referring to the columns:
output_selection = FOREACH output_in_columns GENERATE
$0,
$1;
DUMP output_selection;
Upvotes: 1