Dipak Dendage
Dipak Dendage

Reputation: 648

How to process multi - delimiter file in pig 0.8

I have input text file( name multidelimiter) with followings records

1,Mical,2000;10
2,Smith,3000;20 

I have written pig code as follows

A =LOAD '/user/input/multidelimiter' AS line;
B = FOREACH A GENERATE FLATTEN( REGEX_EXTRACT_ALL( line,'(.*)[,](.*)[,](.*)[;]')) AS (f1,f2,f3,f4);

But this code in not work given following error

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 1, column 78.  Encountered: <EOF> after : "\'(.*)[,](.*)[,](.*)[;"

I refereed following links but not able to resolve my error

how to load files with different delimiter each time in piglatin

Please help me get out from this error.

Thanks.

Upvotes: 1

Views: 461

Answers (2)

Dipak Dendage
Dipak Dendage

Reputation: 648

Finally got solution.

Here is my solution:

A =LOAD '/user/input/multidelimiter' using PigStorage(',') as (empid,ename,line);
B = FOREACH A GENERATE empid,ename, FLATTEN( REGEX_EXTRACT_ALL( line,'(.*)\\u003B(.*)')) AS (sal:int,deptno:int);

Upvotes: 1

kecso
kecso

Reputation: 2485

Solution for your input example: LOAD as comma separated, than STRSPLIT by ';' and FLATTEN

Upvotes: 1

Related Questions