visakh
visakh

Reputation: 2553

How to refer to the columns in pig output

I have a data file that contains values as:

A 1
B 2
C 3
C 3

I wrote the following pig script.

A = load 'users.txt' as (usr: int, nod: int);
B = GROUP A BY usr;
C = FOREACH B GENERATE group,COUNT(A);

Now, I want to use the output C and process it further. How can I refer to the values/columns in C? I tried DUMPing the data, but they came out as key-value pairs? Do I need to write this output to a file, load it again and process?

Thanks, TM

Upvotes: 0

Views: 3912

Answers (2)

Alex A.
Alex A.

Reputation: 2736

If it is cumbersome to name your columns, for instance if you have a large number of columns, you can also refer to the columns by zero indexed column number. The code equivalent to what Davis Broda posted would look like this:

 C = FOREACH B GENERATE group,COUNT(A);
 D = FOREACH C GENERATE $0, $1;

Upvotes: 2

Davis Broda
Davis Broda

Reputation: 4125

Name the columns when they are created in the following manner:

C = FOREACH B GENERATE group as usr,COUNT(A) as countA;

they can then be referred to later by these names, as in the following example:

D = FOREACH C GENERATE usr, countA;

Upvotes: 2

Related Questions