readyforhdp
readyforhdp

Reputation: 1

pig custom function to load multiple character ^^ (double carrot) delimiter

I am new to PIG, can some one help me how can I load a file with multiple characters (in my case '^^') as a column delimiter.

for example i have file with following columns aisforapple^^bisforball^^cisforcat^^disfordoll^^andeisforelephant fisforfish^^gisforgreen^^hisforhat^^iisforicecreem^^andjisforjar kisforking^^lisforlion^^misformango^^nisfornose^^andoisfororange

Regards

Upvotes: 0

Views: 762

Answers (1)

Sivasakthi Jayaraman
Sivasakthi Jayaraman

Reputation: 4724

Regex is best suited for these kind of multiple characters

input.txt
aisforapple^^bisforball^^cisforcat^^disfordoll^^andeisforelephant
fisforfish^^gisforgreen^^hisforhat^^iisforicecreem^^andjisforjar
kisforking^^lisforlion^^misformango^^nisfornose^^andoisfororange

PigScript
A = LOAD 'input.txt' AS line;
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'(.*)\\^\\^(.*)\\^\\^(.*)\\^\\^(.*)\\^\\^(.*)')) AS (f1,f2,f3,f4,f5);
DUMP B;

Output:
(aisforapple,bisforball,cisforcat,disfordoll,andeisforelephant)
(fisforfish,gisforgreen,hisforhat,iisforicecreem,andjisforjar)
(kisforking,lisforlion,misformango,nisfornose,andoisfororange)

Explanation:

For better understanding i break the regex into multiple lines
(.*)\\^\\^ ->Any character match till ^^ and stored into f1,(double backslash for special characters) 
(.*)\\^\\^ ->Any character match till ^^ and stored into f2,(double backslash for special characters) 
(.*)\\^\\^ ->Any character match till ^^ and stored into f3,(double backslash for special characters) 
(.*)\\^\\^ ->Any character match till ^^ and stored into f4,(double backslash for special characters) 
(.*)       ->Any character match till the end of string and stored into f5

Upvotes: 2

Related Questions