Koushik Chandra
Koushik Chandra

Reputation: 1491

Change separator and generate output through PiG

I want to generate the below output from the given input. What will be the best way to get that.

Input:-
"column,1A,extra-A1,extra-A2",column2A,column3A
"((column,1B,extra-B1))",column2B,column3B
"column,1C,extra-C1,extra-C2,extra-C3,extra-C4",column2C,column3C
"column,1D,extra-D1",column2D,column3D


Output:-
column,1A,extra-A1,extra-A2|column2A|column3A
((column,1B,extra-B1))|column2B|column3B
column,1C,extra-C1,extra-C2,extra-C3,extra-C4|column2C|column3C
column,1D,extra-D1|column2D|column3D

Upvotes: 0

Views: 204

Answers (3)

Ankur Singh
Ankur Singh

Reputation: 936

I am using Regular Expression:

Let be try this code:

a = LOAD '/home/hduser/pig_ex1/sample1.txt' as line;
b = FOREACH a GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'["](.*)["][,](.*)[,](.*)'))  AS (f1,f2,f3);

c = FOREACH b GENERATE CONCAT(f1,'|',f2,'|',f3);

dump c;

Upvotes: 0

Koushik Chandra
Koushik Chandra

Reputation: 1491

I am able to resolve it using below, let me know if you have any better option

Input:-
"column,1A,extra-A1,extra-A2",column2A,column3A
"((column,1B,extra-B1))",column2B,column3B
"column,1C,extra-C1,extra-C2,extra-C3,extra-C4",column2C,column3C
"column,1D,extra-D1",column2D,column3D

Pig Script:-
A = LOAD '/home/hduser/pig_ex1/sample1.txt' AS line;
B = FOREACH A GENERATE SUBSTRING(line,1,(LAST_INDEX_OF(line,'"'))) AS firstcol, SUBSTRING(line,(LAST_INDEX_OF(line,'"')+2),(INT) SIZE(line)) as lastcol;
C = FOREACH B GENERATE firstcol, FLATTEN(STRSPLIT(lastcol,'\\,',2)) AS (secondcol,thirdcol);
D = FOREACH C GENERATE CONCAT(firstcol,'|',secondcol,'|',thirdcol);

Output:-
(column,1A,extra-A1,extra-A2|column2A|column3A)
(((column,1B,extra-B1))|column2B|column3B)
(column,1C,extra-C1,extra-C2,extra-C3,extra-C4|column2C|column3C)
(column,1D,extra-D1|column2D|column3D)

Upvotes: 1

Naman Mahor
Naman Mahor

Reputation: 11

Try org.apache.pig.piggybank.storage.CSVExcelStorage(','); from piggybank jar.

Upvotes: 0

Related Questions