Blue Diamond
Blue Diamond

Reputation: 3069

Storing the result of Pig "GROUP BY" into HDFS

I am looking for a way to store the output of "group by" command in pig into a file.

(D1,{(A1,null,C1,D1,E1),(null,B1,C1,D1,E1),(A2,null,null,D1,E2)})
(C1,{(A1,null,C1,D1,E1),(null,B1,C1,D1,E1)})

I have tried the store command, but the data is not being copied exactly how it is displayed.

store F into '/tmp/group_out';

Is there any alternative approach to copy the data into a file, in the same way as it is displayed?

Upvotes: 0

Views: 214

Answers (1)

Jakub Kotowski
Jakub Kotowski

Reputation: 7571

The default PigStorage function (invoked on using STORE) is configurable to some extent: http://pig.apache.org/docs/r0.12.0/func.html#pigstorage - you can set for example field and record delimiters.

You will have to implement a custom Store UDF if you need a special format for storing your data.

Upvotes: 1

Related Questions