dazzles dina
dazzles dina

Reputation: 23

How to split a data in particular column into two other columns using pig scripts?

Hi i am working in big data ,since i am a new bee to pig programming help me to get the required output.I have a csv file which have many columns,one of the column is price,which has data like the following:

(10 Lacs)
(20 to 30 Lacs)
And i need this to be splitted as 
price    min         max
10      null        null
null    20          30  

I have tried the following code


a = LOAD  '/user/folder1/filename.csv' using PigStorage(',')as(SourceWebsite:chararray,PropertyType:chararray,PropertyId:chararray,title:chararray,bedroom:int,bathroom:int,Balconies:chararray,price:chararray,pricepersqft:chararray,builtuparea:chararray,address:chararray,otherdetails:chararray,description:chararray,posted:chararray,Features:chararray,ContactDetails:chararray);
b = FOREACH a GENERATE STRSPLIT(price, 'to');
c = FOREACH b GENERATE FLATTEN(STRSPLIT(Price,',')) AS (MAX:int,MIN:int);
dump c;

Any help will be appreciated.

Upvotes: 1

Views: 3175

Answers (1)

Dennis Jaheruddin
Dennis Jaheruddin

Reputation: 21563

I just ran into the same issue, and here is how I managed to solve it.

Suppose the column called outputraw.outputlineraw looks like this:

abc|def 
gh|j

Then I split it into multiple columns like so:

output_in_columns = FOREACH output_raw GENERATE
    FLATTEN(STRSPLIT(output_line_raw,'\\|'));

To test whether it succeeded, I dumped the result after referring to the columns:

output_selection = FOREACH output_in_columns GENERATE
    $0,
    $1; 

DUMP output_selection;

Upvotes: 1

Related Questions