Reputation: 1
I am new to Pig Latin, I am trying below example with Pig BUILT IN functions.
A = LOAD 'student.txt' AS (name:chararray, term:chararray, gpa:float);
B = GROUP A BY name;
DUMP B;
(John,{(John,sm,3.8),(John,sp,4.0),(John,wt,3.7),(John,fl ,3.9)})
(Mary,{(Mary,sm,4.0),(Mary,sp,4.0),(Mary,wt,3.9),(Mary,fl,3.8)})
I need to retrieve 1st element => (John,sm,3.8)
and last element => (John,fl ,3.9)
from the bag.
Need help to resolve with out using UDF.
Upvotes: 0
Views: 2517
Reputation: 3599
Ok.. You can use this solution.. But it is little lengthy.
names = LOAD '/user/user/inputfiles/names.txt' USING PigStorage(',') AS(name:chararray,term:chararray,gpa:float);
names_rank = RANK names;
names_each = FOREACH names_rank GENERATE $0 as row_id,name,term,gpa;
names_grp = GROUP names_each BY name;
names_first_each = FOREACH names_grp
{
order_asc = ORDER names_each BY row_id ASC;
first_rec = LIMIT order_asc 1;
GENERATE flatten(first_rec) as(row_id,name,term,gpa);
};
names_last_each = FOREACH names_grp
{
order_desc = ORDER names_each BY row_id DESC;
last_rec = LIMIT order_desc 1;
GENERATE flatten(last_rec) as(row_id,name,term,gpa);
};
names_unioned = UNION names_first_each,names_last_each;
names_extract = FOREACH names_unioned GENERATE name,term,gpa;
names_ordered = ORDER names_extract BY name;
dump names_ordered;
Output :-
(John,fl,3.9)
(John,sm,3.8)
(Mary,fl,3.8)
(Mary,sm,4.0)
Upvotes: 1