Reputation: 1491
I want to generate the below output from the given input. What will be the best way to get that.
Input:-
"column,1A,extra-A1,extra-A2",column2A,column3A
"((column,1B,extra-B1))",column2B,column3B
"column,1C,extra-C1,extra-C2,extra-C3,extra-C4",column2C,column3C
"column,1D,extra-D1",column2D,column3D
Output:-
column,1A,extra-A1,extra-A2|column2A|column3A
((column,1B,extra-B1))|column2B|column3B
column,1C,extra-C1,extra-C2,extra-C3,extra-C4|column2C|column3C
column,1D,extra-D1|column2D|column3D
Upvotes: 0
Views: 204
Reputation: 936
I am using Regular Expression:
Let be try this code:
a = LOAD '/home/hduser/pig_ex1/sample1.txt' as line;
b = FOREACH a GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'["](.*)["][,](.*)[,](.*)')) AS (f1,f2,f3);
c = FOREACH b GENERATE CONCAT(f1,'|',f2,'|',f3);
dump c;
Upvotes: 0
Reputation: 1491
I am able to resolve it using below, let me know if you have any better option
Input:-
"column,1A,extra-A1,extra-A2",column2A,column3A
"((column,1B,extra-B1))",column2B,column3B
"column,1C,extra-C1,extra-C2,extra-C3,extra-C4",column2C,column3C
"column,1D,extra-D1",column2D,column3D
Pig Script:-
A = LOAD '/home/hduser/pig_ex1/sample1.txt' AS line;
B = FOREACH A GENERATE SUBSTRING(line,1,(LAST_INDEX_OF(line,'"'))) AS firstcol, SUBSTRING(line,(LAST_INDEX_OF(line,'"')+2),(INT) SIZE(line)) as lastcol;
C = FOREACH B GENERATE firstcol, FLATTEN(STRSPLIT(lastcol,'\\,',2)) AS (secondcol,thirdcol);
D = FOREACH C GENERATE CONCAT(firstcol,'|',secondcol,'|',thirdcol);
Output:-
(column,1A,extra-A1,extra-A2|column2A|column3A)
(((column,1B,extra-B1))|column2B|column3B)
(column,1C,extra-C1,extra-C2,extra-C3,extra-C4|column2C|column3C)
(column,1D,extra-D1|column2D|column3D)
Upvotes: 1
Reputation: 11
Try org.apache.pig.piggybank.storage.CSVExcelStorage(','); from piggybank jar.
Upvotes: 0