nyc0202034
nyc0202034

Reputation: 113

How to properly aggregate unique count with Apache Pig?

I am working with a simple dataset like this:

Item-Sold      Date
Desk A       2/1/2014
Desk A       2/1/2014
Desk A       2/1/2014
Desk A       2/1/2014
Desk B       2/1/2014
Desk C       2/1/2014
Chair A      2/2/2014
Chair B      2/2/2014
Chair B      2/2/2014

I need help writing a piglatin query to find the # of unique items sold by date

So my output would be:

Date      Unique-Items-Sold
2/1/2014         3
2/2/2014         2

I am having trouble creating the right statement that would work. Looking for some help. Thank you.

Upvotes: 0

Views: 231

Answers (1)

Magham Ravi
Magham Ravi

Reputation: 603

    --unique_count.pig
    items = LOAD 'items.csv' using PigStorage(',') AS (item,date);
    grpd = GROUP items BY date;
    distinct_cnt = FOREACH grpd {
              it = items.item;
              unique_it = distinct it;
              GENERATE group, COUNT(unique_it);
    };
    DUMP distinct_cnt;

Hope this helps!!

Upvotes: 1

Related Questions