Reputation: 25
I am trying to count the number of unique IDs in a table for each month. But the catch is that the count of IDs for each month should only include IDs which were not present in the previous month
I am trying to write an SQL query which will work in google BigQuery but so far I have only figured out how to get the count of distinct IDs for each month. I am not able to figure out how to get the condition for the IDs not being present in the previous month.
For e.g. I have a table like below tbl1:
time_stamp | ID | col3 | col4
-------------------------------
2019-06-10 | 1 | 10 | 20
2019-06-10 | 2 | 11 | 21
2019-06-10 | 3 | 12 | 22
2019-07-10 | 2 | 11 | 21
2019-07-10 | 4 | 13 | 23
2019-08-10 | 4 | 13 | 23
2019-08-10 | 5 | 14 | 24
2019-09-10 | 5 | 14 | 24
2019-09-10 | 6 | 15 | 25
Expected Output
time_stamp | count
--------------------
2019-06-10 | 3
2019-07-10 | 1
2019-08-10 | 1
2019-09-10 | 1
Upvotes: 0
Views: 1275
Reputation: 173141
Update
I realized - you asked for count of IDs for each month should only include IDs which were not present in the
previous month - not in previous months, but month
Below is solution for it
#standardSQL
SELECT month, COUNT(1) users
FROM (
SELECT *, IFNULL(DATE_DIFF(month, LAG(month) OVER(PARTITION BY ID ORDER BY month), MONTH), 0) != 1 qualified
FROM (
SELECT DISTINCT DATE_TRUNC(time_stamp, MONTH) month, ID FROM `project.dataset.table`
)
)
WHERE qualified
GROUP BY month
you can test, play with it using below sample data
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2019-06-10' time_stamp, 1 ID, 10 col3, 20 col4 UNION ALL
SELECT '2019-06-10', 2, 11, 21 UNION ALL
SELECT '2019-06-10', 3, 12, 22 UNION ALL
SELECT '2019-06-11', 3, 12, 22 UNION ALL
SELECT '2019-07-10', 2, 11, 21 UNION ALL
SELECT '2019-07-10', 4, 13, 23 UNION ALL
SELECT '2019-08-10', 1, 13, 23 UNION ALL
SELECT '2019-08-10', 4, 13, 23 UNION ALL
SELECT '2019-08-10', 5, 14, 24 UNION ALL
SELECT '2019-09-10', 5, 14, 24 UNION ALL
SELECT '2019-09-10', 6, 15, 25
)
SELECT month, COUNT(1) users
FROM (
SELECT *, IFNULL(DATE_DIFF(month, LAG(month) OVER(PARTITION BY ID ORDER BY month), MONTH), 0) != 1 qualified
FROM (
SELECT DISTINCT DATE_TRUNC(time_stamp, MONTH) month, ID FROM `project.dataset.table`
)
)
WHERE qualified
GROUP BY month
-- ORDER BY month
with result
Row month users
1 2019-06-01 3
2 2019-07-01 1
3 2019-08-01 2
4 2019-09-01 1
Hope, this time it is what you asked!
Initial answer Below is for BigQuery Standard SQL and returns count of users which are not presented in prev months
#standardSQL
SELECT time_stamp, COUNT(1) `count`
FROM (
SELECT *, COUNT(1) OVER(PARTITION BY ID ORDER BY time_stamp) = 1 first_entry
FROM `project.dataset.table`
)
WHERE first_entry
GROUP BY time_stamp
if to apply to sample data from your question - output is
Row time_stamp count
1 2019-06-10 3
2 2019-07-10 1
3 2019-08-10 1
4 2019-09-10 1
You can test, play with it using below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2019-06-10' time_stamp, 1 ID, 10 col3, 20 col4 UNION ALL
SELECT '2019-06-10', 2, 11, 21 UNION ALL
SELECT '2019-06-10', 3, 12, 22 UNION ALL
SELECT '2019-07-10', 2, 11, 21 UNION ALL
SELECT '2019-07-10', 4, 13, 23 UNION ALL
SELECT '2019-08-10', 4, 13, 23 UNION ALL
SELECT '2019-08-10', 5, 14, 24 UNION ALL
SELECT '2019-09-10', 5, 14, 24 UNION ALL
SELECT '2019-09-10', 6, 15, 25
)
SELECT time_stamp, COUNT(1) `count`
FROM (
SELECT *, COUNT(1) OVER(PARTITION BY ID ORDER BY time_stamp) = 1 first_entry
FROM `project.dataset.table`
)
WHERE first_entry
GROUP BY time_stamp
-- ORDER BY time_stamp
In case if you need to group by month vs. by date (it is not clear from your question)
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2019-06-10' time_stamp, 1 ID, 10 col3, 20 col4 UNION ALL
SELECT '2019-06-11', 2, 11, 21 UNION ALL
SELECT '2019-06-12', 3, 12, 22 UNION ALL
SELECT '2019-07-10', 2, 11, 21 UNION ALL
SELECT '2019-07-11', 4, 13, 23 UNION ALL
SELECT '2019-08-10', 4, 13, 23 UNION ALL
SELECT '2019-08-12', 5, 14, 24 UNION ALL
SELECT '2019-09-10', 5, 14, 24 UNION ALL
SELECT '2019-09-13', 6, 15, 25
)
SELECT DATE_TRUNC(time_stamp, MONTH) month, COUNT(1) `count`
FROM (
SELECT *, COUNT(1) OVER(PARTITION BY ID ORDER BY time_stamp) = 1 first_entry
FROM `project.dataset.table`
)
WHERE first_entry
GROUP BY month
-- ORDER BY month
above returns monthly users excluding those which were present in previous months
Row month count
1 2019-06-01 3
2 2019-07-01 1
3 2019-08-01 1
4 2019-09-01 1
Upvotes: 1
Reputation: 1270723
You can use two levels of aggregation:
select yyyymm, count(*)
from (select id, date_trunc(min(time_stamp), month) as yyyymm
from tbl1
group by id
) t
group by yyyymm
order by yyyymm;
Upvotes: 1