Reputation: 156
Is there a good way to select distinct rows in a table, in Pig Latin? For example, say I have the table (1, 2, 3); (2, 5, 1); (1, 2, 3), but I want (1, 2, 3); (2, 5, 1).
Upvotes: 0
Views: 164
Reputation: 458
Yes in Pig Latin there is the relational operator DISTINCT that does exactly this.
For example:
-- assume input is:
-- 1,2,3
-- 2,5,1
-- 1,2,3
data = LOAD 'input' USING PigStorage(',') AS (val1:int,val2:int,val3:int);
data2 = DISTINCT data;
-- produces:
-- 1,2,3
-- 2,5,1
DUMP data2;
Upvotes: 2