Reputation: 82
I m trying to write pig script that gets string data like this: abc|def|xyz and tries to put these values into an array of string.
How do i split this string to get an array of string like [abc,def,xyz] ?
I tried using STRSPLIT function, but the no of splits in my case is not fixed. The number of pipe separated values can vary and i need all of those value to be in that array.
Any suggestions???
Upvotes: 1
Views: 4419
Reputation: 2287
Another feasible option is to make use of TOKENIZE. Would suggest to go with the solution suggested by @Balduz.
A = load 'data.txt' using PigStorage(',');
B = foreach A generate BagToString(TOKENIZE($0,'|'),',');
DUMP B;
Output : DUMP B :
(abc,def,xyz)
(abc,def,xyz,abc,def,xyz)
Upvotes: 2
Reputation: 3570
You were in the right direction, but there is one thing of the STRSPLIT
you didn't notice. You can use it also when the number of splits is not fixed. The third argument for that UDF is the number of 'splits' you have, but you can pass a negative number and it will look for all the possible splits that match your expression.
From the official documentation for STRSPLIT:
limit
If the value is positive, the pattern (the compiled representation of the regular expression) is applied at most limit-1 times, therefore the value of the argument means the maximum length of the result tuple. The last element of the result tuple will contain all input after the last match.
If the value is negative, no limit is applied for the length of the result tuple.
Imagine this input:
abc|def|xyz,1
abc|def|xyz|abc|def|xyz,2
You can do the following:
A = load 'data.txt' using PigStorage(',');
B = foreach A generate STRSPLIT($0,'\\|',-1);
And the output will be:
DUMP B;
((abc,def,xyz))
((abc,def,xyz,abc,def,xyz))
Upvotes: 4