Mahi
Mahi

Reputation: 73

Pig referencing

I am learning Hadoop pig and I always stuck at referencing the elements.please find the below example.

groupwordcount: {group: chararray,words: {(bag_of_tokenTuples_from_line::token: chararray)}}

Can somebody please explain how to reference the elements if we have nested tuples and bags.

Any Links for better understanding the nested referrencing would be great help.

Upvotes: 2

Views: 127

Answers (1)

Anurag Yadav
Anurag Yadav

Reputation: 396

Let's do a simple Demonstration to understand this problem.

say a file 'a.txt' stored at '/tmp/a.txt' folder in HDFS

A = LOAD '/tmp/a.txt' using PigStorage(',') AS (name:chararray,term:chararray,gpa:float);

Dump A;

(John,fl,3.9)

(John,fl,3.7)

(John,sp,4.0)

(John,sm,3.8)

(Mary,fl,3.8)

(Mary,fl,3.9)

(Mary,sp,4.0)

(Mary,sm,4.0)

Now let's group by this Alias 'A' on the basis of some parameter say name and term

B = GROUP A BY (name,term);

dump B;

((John,fl),{(John,fl,3.7),(John,fl,3.9)})

((John,sm),{(John,sm,3.8)})

((John,sp),{(John,sp,4.0)})

((Mary,fl),{(Mary,fl,3.9),(Mary,fl,3.8)})

((Mary,sm),{(Mary,sm,4.0)})

((Mary,sp),{(Mary,sp,4.0)})

describe B;

B: {group: (name: chararray,term: chararray),A: {(name: chararray,term: chararray,gpa: float)}}

now it has become the problem statement that you have asked. Let me demonstrate you how to access elements of group tuple or element of A tuple or both

C = foreach B generate group.name,group.term,A.name,A.term,A.gpa;

dump C;

(John,fl,{(John),(John)},{(fl),(fl)},{(3.7),(3.9)})

(John,sm,{(John)},{(sm)},{(3.8)})

(John,sp,{(John)},{(sp)},{(4.0)})

(Mary,fl,{(Mary),(Mary)},{(fl),(fl)},{(3.9),(3.8)})

(Mary,sm,{(Mary)},{(sm)},{(4.0)})

(Mary,sp,{(Mary)},{(sp)},{(4.0)})

So we accessed all elements by this way.

hope this helped

Upvotes: 1

Related Questions