Claudio Lombardi
Claudio Lombardi

Reputation: 3

How do I create a CSV file with sequence pig script?

I have a CSV file with a column to which I would add a sequence of numbers and then link the fields with a join.

Column_A
-----------
claudio
carlo
pierluigi
giovanni

Result:

Column_A    |Column_B
---------------------
claudio     | 1
carlo       | 2
pierluigi   | 3
giovanni    | 4

Alternatively is there a method to merge two columns of two files that have fields which are to join in?

FILE 1:

Column_A
-------------
claudio
carlo
pierluigi
giovanni

FILE 2:

Column_B
-------------
napoli
roma
milano
genova

Result:

Column_A   | Column_B
---------------------
claudio    | napoli
carlo      | roma
pierluigi  | milano
giovanni   | genova

Upvotes: 0

Views: 55

Answers (1)

AntonyBrd
AntonyBrd

Reputation: 403

There are many ways, you can use Apache Pig to do what you want to do.

Since 0.11 version you can use RANK operator.

-- First load your csv file
A1 = LOAD '/path/to/file/file1.csv USING PigStorage(',') AS(name:CHARARRAY);
-- Then RANK
B1 = RANK A1;
-- Look at the results 
DUMP B;
-- First load your csv file
A2 = LOAD '/path/to/file/file2.csv USING PigStorage(',') AS(city:CHARARRAY);
B2 = RANK A2;
--- Then join by id (row number)
C = JOIN B1 BY $0, B2 BY $0;

Upvotes: 1

Related Questions