Reputation: 113
I am working with a simple dataset like this:
Item-Sold Date
Desk A 2/1/2014
Desk A 2/1/2014
Desk A 2/1/2014
Desk A 2/1/2014
Desk B 2/1/2014
Desk C 2/1/2014
Chair A 2/2/2014
Chair B 2/2/2014
Chair B 2/2/2014
I need help writing a piglatin query to find the # of unique items sold by date
So my output would be:
Date Unique-Items-Sold
2/1/2014 3
2/2/2014 2
I am having trouble creating the right statement that would work. Looking for some help. Thank you.
Upvotes: 0
Views: 231
Reputation: 603
--unique_count.pig
items = LOAD 'items.csv' using PigStorage(',') AS (item,date);
grpd = GROUP items BY date;
distinct_cnt = FOREACH grpd {
it = items.item;
unique_it = distinct it;
GENERATE group, COUNT(unique_it);
};
DUMP distinct_cnt;
Hope this helps!!
Upvotes: 1