Reputation: 99
File has Data :
A 12345
B 32122
C 23232
what is the option to run only one time pig script and store first record(A 12345)
in one file , second record(B 32122)
in second file and third(c 23232)
in third file. Right now if we run the pig script it will run the job for each store. Please let me know the option.
Upvotes: 2
Views: 4439
Reputation: 5881
Use the SPLIT operator to partition the contents of a relation into two or more relations based on some expression. Depending on the conditions stated in the expression:
A tuple may be assigned to more than one relation.
A tuple may not be assigned to any relation.
Example
In this example relation A is split into three relations, X, Y, and Z.
A = LOAD 'data' AS (f1:int,f2:int,f3:int);
DUMP A;
(1,2,3)
(4,5,6)
(7,8,9)
SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6);
DUMP X;
(1,2,3)
(4,5,6)
DUMP Y;
(4,5,6)
DUMP Z;
(1,2,3)
(7,8,9)
then STORE X, Y ,Z according to your filename
My aim is to read a file and write the record in to different files based on criteria it will fit to your problem.
Upvotes: 1
Reputation: 4724
You can try with MultiStorage() option, It will be available in piggybank jar. you need to download pig-0.11.1.jar and set it in your classpath.
Example:
input.txt
A 12345
B 32122
C 23232
PigScript:
A = LOAD 'input.txt' USING PigStorage(' ') AS (f1,f2);
STORE A INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0');
Now output folder contains 3 dirs A,B,C and filenames(A-0,000 ,B-0,000 and C-0,000 ) contain the actual value
output$ ls
A B C _SUCCESS
output$ cat A/A-0,000
A 12345
output$ cat B/B-0,000
B 32122
output$ cat C/C-0,000
C 23232
Upvotes: 0
Reputation: 180
Actually pig is not made for this. But still if you wanna do that then will have to write a custom store function. Will have to write some class which extends StoreFunc class. Further inside it will have to use Multiple outputs since you wanna store in 3 different files.
Refer https://pig.apache.org/docs/r0.7.0/udf.html#Store+Functions for custom store function.
Otherwise in pig, one store command will store only one alias, only in one file.
For such kind of requirement better you write JAVA MR.
Upvotes: 0