Reputation: 73
I am learning Hadoop pig and I always stuck at referencing the elements.please find the below example.
groupwordcount: {group: chararray,words: {(bag_of_tokenTuples_from_line::token: chararray)}}
Can somebody please explain how to reference the elements if we have nested tuples and bags.
Any Links for better understanding the nested referrencing would be great help.
Upvotes: 2
Views: 127
Reputation: 396
Let's do a simple Demonstration to understand this problem.
say a file 'a.txt' stored at '/tmp/a.txt' folder in HDFS
A = LOAD '/tmp/a.txt' using PigStorage(',') AS (name:chararray,term:chararray,gpa:float);
Dump A;
(John,fl,3.9)
(John,fl,3.7)
(John,sp,4.0)
(John,sm,3.8)
(Mary,fl,3.8)
(Mary,fl,3.9)
(Mary,sp,4.0)
(Mary,sm,4.0)
Now let's group by this Alias 'A' on the basis of some parameter say name and term
B = GROUP A BY (name,term);
dump B;
((John,fl),{(John,fl,3.7),(John,fl,3.9)})
((John,sm),{(John,sm,3.8)})
((John,sp),{(John,sp,4.0)})
((Mary,fl),{(Mary,fl,3.9),(Mary,fl,3.8)})
((Mary,sm),{(Mary,sm,4.0)})
((Mary,sp),{(Mary,sp,4.0)})
describe B;
B: {group: (name: chararray,term: chararray),A: {(name: chararray,term: chararray,gpa: float)}}
now it has become the problem statement that you have asked. Let me demonstrate you how to access elements of group tuple or element of A tuple or both
C = foreach B generate group.name,group.term,A.name,A.term,A.gpa;
dump C;
(John,fl,{(John),(John)},{(fl),(fl)},{(3.7),(3.9)})
(John,sm,{(John)},{(sm)},{(3.8)})
(John,sp,{(John)},{(sp)},{(4.0)})
(Mary,fl,{(Mary),(Mary)},{(fl),(fl)},{(3.9),(3.8)})
(Mary,sm,{(Mary)},{(sm)},{(4.0)})
(Mary,sp,{(Mary)},{(sp)},{(4.0)})
So we accessed all elements by this way.
hope this helped
Upvotes: 1