minamino
minamino

Reputation: 115

Query distinct in google bigquery

I have a table with four columns, looking like this:

id,name, key, date
 1,'A' ,'x1','2015-11-11'
 2,'A' ,'x1','2015-11-11'
 3,'B' ,'x2','2015-11-11'
 4,'B' ,'x2','2015-11-11'
 5,'A' ,'x1','2015-11-12'
 6,'A' ,'x1','2015-11-12'
 7,'B' ,'x2','2015-11-12'
 8,'B' ,'x2','2015-11-12'
 9,'D' ,'x3','2015-11-12'
 10,'A' ,'x1','2015-12-11'
 11,'A' ,'x1','2015-12-11'
 12,'B' ,'x2','2015-12-11'
 13,'B' ,'x2','2015-12-11'
 14,'A' ,'x1','2015-12-12'
 15,'A' ,'x1','2015-12-12'
 16,'B' ,'x2','2015-12-12'
 17,'B' ,'x2','2015-12-12'
 18,'D' ,'x3','2015-12-12'

I want to count the number of distinct new key-s for each date:

2015-11-11  2                     -- (two distinct keys: x1 and x2)
2015-11-12  1                     -- (one new key: x3)
2015-12-11  2                     -- (two distinct keys: x1 and x2) - (different month 11)
2015-12-12  1                     -- (one new key: x3) - (different month 11)

Only distinct in each month.

How can I do this?

Upvotes: 0

Views: 1296

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

it is exactly as in you previous question(s) - you just need to add extra group/partition by month --> see use of YearMonth field

SELECT DATE, EXACT_COUNT_DISTINCT(key) AS keys
FROM (
  SELECT DATE, key, LEAD(DATE) OVER(PARTITION BY key, YearMonth ORDER BY DATE DESC) AS new
  FROM (SELECT DATE, LEFT(DATE, 7) AS YearMonth, key FROM YourTable GROUP BY 1, 2, 3)
) WHERE new IS NULL
GROUP BY DATE 
ORDER BY DATE

Upvotes: 1

Mosha Pasumansky
Mosha Pasumansky

Reputation: 13994

Here is how I tried to solve it:

  1. Extract month from date:

    select key, date, regexp_extract(date, r'[\d]+-(\d\d)-\d\d') month from t

  2. Partition by month since we want to start counts from beginning every month, and compute distinct keys from beginning of the month

    select date, month, count(distinct key) over (partition by month order by date rows between unbounded preceding and current row) cd from (select key, date, regexp_extract(date, r'[\d]+-(\d\d)-\d\d') month from t)

  3. Pick the total count distincts per month

    select date, month, max(cd) cd from ( select date, month, count(distinct key) over (partition by month order by date rows between unbounded preceding and current row) cd from (select key, date, regexp_extract(date, r'[\d]+-(\d\d)-\d\d') month from t)) group by 1, 2

  4. For every date compute number of total unique keys from the beginning of month for previous date:

    select date, cd, lag(cd, 1) over (partition by month order by date) prev_cd from ( select date, month, max(cd) cd from ( select date, month, count(distinct key) over (partition by month order by date rows between unbounded preceding and current row) cd from (select key, date, regexp_extract(date, r'[\d]+-(\d\d)-\d\d') month from t)) group by 1, 2)

  5. Subtract previous date from current one - that's the answer:

    select date, cd - prev_cd from ( select date, cd, lag(cd, 1) over (partition by month order by date) prev_cd from ( select date, month, max(cd) cd from ( select date, month, count(distinct key) over (partition by month order by date rows between unbounded preceding and current row) cd from (select key, date, regexp_extract(date, r'[\d]+-(\d\d)-\d\d') month from moshap.wd)) group by 1, 2))

Upvotes: 0

Related Questions