Reputation: 97
Dataset - Contains PostId and userID
20 1
21 2
45 3
85 1
48 1
98 1
74 1
96 2
63 2
33 3
44 3
55 3
66 3
77 3
I want to access the userID with maximum no. of post
PIG code
A = load '/home/cloudera/Desktop/post.txt' as (postid:chararray, userid:chararray);
B = load '/home/cloudera/Desktop/user.txt' as (name:chararray, id:chararray);
C = group A by userid;
D = foreach C generate group,COUNT(A.postid) as count;
E = order D by count DESC;
F = limit D 1;
It gives output -
(3,6)
Now what should be the PIG statement to access username from user.txt whose id is same as A.userid after execution of F statement?
Upvotes: 0
Views: 173
Reputation: 11080
Add another statement to get the first column from relation F
G = FOREACH F GENERATE $0;
DUMP G;
Upvotes: 1
Reputation: 212
use the below SQL statement to get that desired output
declare @var int = (select max(cn) from (select count(post) cn from temp group by userid) c)
select * from (select userid,count(post) as pso from temp group by userid ) as c where pso = @var
Upvotes: 0