Reputation: 27
I have a table of users and how many events they fired on a given date:
DATE | USERID | EVENTS |
---|---|---|
2021-08-27 | 1 | 5 |
2021-07-25 | 1 | 7 |
2021-07-23 | 2 | 3 |
2021-07-20 | 3 | 9 |
2021-06-22 | 1 | 9 |
2021-05-05 | 1 | 4 |
2021-05-05 | 2 | 2 |
2021-05-05 | 3 | 6 |
2021-05-05 | 4 | 8 |
2021-05-05 | 5 | 1 |
I want to create a table showing number of active users for each date with active user being defined as someone who has fired an event on the given date or in any of the preceding 30 days.
DATE | ACTIVE_USERS |
---|---|
2021-08-27 | 1 |
2021-07-25 | 3 |
2021-07-23 | 2 |
2021-07-20 | 2 |
2021-06-22 | 1 |
2021-05-05 | 5 |
I tried the following query which returned only the users who were active on the specified date:
SELECT COUNT(DISTINCT USERID), DATE
FROM table
WHERE DATE >= (CURRENT_DATE() - interval '30 days')
GROUP BY 2 ORDER BY 2 DESC;
I also tried using a window function with rows between but seems to end up getting the same result:
SELECT
DATE,
SUM(ACTIVE_USERS) AS ACTIVE_USERS
FROM
(
SELECT
DATE,
CASE
WHEN SUM(EVENTS) OVER (PARTITION BY USERID ORDER BY DATE ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) >= 1 THEN 1
ELSE 0
END AS ACTIVE_USERS
FROM table
)
GROUP BY 1
ORDER BY 1
I'm using SQL:ANSI on Snowflake. Any suggestions would be much appreciated.
Upvotes: 0
Views: 886
Reputation: 1269973
This is tricky to do as window functions -- because count(distinct)
is not permitted. You can use a self-join:
select t1.date, count(distinct t2.userid)
from table t join
table t2
on t2.date <= t.date and
t2.date > t.date - interval '30 day'
group by t1.date;
However, that can be expensive. One solution is to "unpivot" the data. That is, do an incremental count per user of going "in" and "out" of active states and then do a cumulative sum:
with d as ( -- calculate the dates with "ins" and "outs"
select user, date, +1 as inc
from table
union all
select user, date + interval '30 day', -1 as inc
from table
),
d2 as ( -- accumulate to get the net actives per day
select date, user, sum(inc) as change_on_day,
sum(sum(inc)) over (partition by user order by date) as running_inc
from d
group by date, user
),
d3 as ( -- summarize into active periods
select user, min(date) as start_date, max(date) as end_date
from (select d2.*,
sum(case when running_inc = 0 then 1 else 0 end) over (partition by user order by date) as active_period
from d2
) d2
where running_inc > 0
group by user
)
select d.date, count(d3.user)
from (select distinct date from table) d left join
d3
on d.date >= start_date and d.date < end_date
group by d.date;
Upvotes: 2