Surender Raja
Surender Raja

Reputation: 3599

StrSplit in Pig functions

Can Some one explain me on getting this below output in Pigscript

my input file is below

a.txt

aaa.kyl,data,data
bbb.kkk,data,data
cccccc.hj,data,data
qa.dff,data,data

I am writing the pig script like this

A = LOAD 'a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(a1)),a2,a3;

I dont know how to proceed with this.. i need out put like this below.Basically i need all chars after the dot symbol in the first atom

(kyl,data,data)
(kkk,data,data)
(hj,data,data)
(dff,data,data)

Can some one give me the code for this

Upvotes: 6

Views: 12009

Answers (3)

gbharat
gbharat

Reputation: 276

A = LOAD 'a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);

B = FOREACH A GENERATE FLATTEN(STRSPLIT(a1,'.')),a2,a3;

This will seperate a1 into 2 parts which is before dot and after dot, from this you can select after dot operator.

C = foreach B generate $1,$2,$3;

where $1 is after Dot operator

Upvotes: 1

Rengasamy
Rengasamy

Reputation: 1043

You can try with STRSPLIT() by following,

A = LOAD 'C:\\Users\\Ren\\Desktop\\file' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray); 

B = foreach A generate SUBSTRING(a1,INDEXOF(a1,'.',0)+1,(int)SIZE(a1)),a2,a3;                                                                                 

Upvotes: 4

Rajnish G
Rajnish G

Reputation: 237

Here is what you need to do -

Here is an escaping problem in the pig parsing routines when it encounters the dot as its considered as an operator refer this link for more information Dot Operator.

You can use a unicode escape sequence for a dot instead: \u002E. However this must also be slash escaped and put in a single quoted string.

The below code will do the work for you and you can fine tune it as per your convenience -

A = LOAD 'a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(a1,'\\u002E')) as (a1:chararray, a1of1:chararray),a2,a3;
C = FOREACH B GENERATE a1of1,a2,a3;

Hope this helps.

Upvotes: 10

Related Questions