jk7
jk7

Reputation: 99

Store multiple file Using Same Pig Script

File has Data :

A 12345
B 32122
C 23232

what is the option to run only one time pig script and store first record(A 12345) in one file , second record(B 32122) in second file and third(c 23232) in third file. Right now if we run the pig script it will run the job for each store. Please let me know the option.

Upvotes: 2

Views: 4439

Answers (3)

Kishore
Kishore

Reputation: 5881

Use the SPLIT operator to partition the contents of a relation into two or more relations based on some expression. Depending on the conditions stated in the expression:

A tuple may be assigned to more than one relation.

A tuple may not be assigned to any relation.

Example

In this example relation A is split into three relations, X, Y, and Z.

A = LOAD 'data' AS (f1:int,f2:int,f3:int);

DUMP A;                
(1,2,3)
(4,5,6)
(7,8,9)        

SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6);

DUMP X;
(1,2,3)
(4,5,6)

DUMP Y;
(4,5,6)

DUMP Z;
(1,2,3)
(7,8,9)

then STORE X, Y ,Z according to your filename

My aim is to read a file and write the record in to different files based on criteria it will fit to your problem.

Upvotes: 1

Sivasakthi Jayaraman
Sivasakthi Jayaraman

Reputation: 4724

You can try with MultiStorage() option, It will be available in piggybank jar. you need to download pig-0.11.1.jar and set it in your classpath.

Example:
input.txt

A 12345
B 32122
C 23232

PigScript:

A = LOAD 'input.txt' USING PigStorage(' ') AS (f1,f2);
STORE A INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0');

Now output folder contains 3 dirs A,B,C and filenames(A-0,000 ,B-0,000 and C-0,000 ) contain the actual value
output$ ls

A       B       C       _SUCCESS

output$ cat A/A-0,000

A   12345

output$ cat B/B-0,000

B   32122

output$ cat C/C-0,000

C   23232

Upvotes: 0

Yashodhan K
Yashodhan K

Reputation: 180

Actually pig is not made for this. But still if you wanna do that then will have to write a custom store function. Will have to write some class which extends StoreFunc class. Further inside it will have to use Multiple outputs since you wanna store in 3 different files.

Refer https://pig.apache.org/docs/r0.7.0/udf.html#Store+Functions for custom store function.

Otherwise in pig, one store command will store only one alias, only in one file.

For such kind of requirement better you write JAVA MR.

Upvotes: 0

Related Questions