Reputation: 466
I'm trying to find the total count of active users in a database. "Active" users here as defined as those who have registered an event on the selected day or later than the selected day. So if a user registered an event on days 1, 2 and 5, they are counted as "active" throughout days 1, 2, 3, 4 and 5.
My original dataset looks like this (note that this is a sample - the real dataset will run to up to 365 days, and has around 1000 users).
Day ID
0 1
0 2
0 3
0 4
0 5
1 1
1 2
2 1
3 1
4 1
4 2
As you can see, all 5 IDs are active on Day 0
, and 2 IDs (1
and 2
) are active until Day 4
, so I'd like the finished table to look like this:
Day Count
0 5
1 2
2 2
3 2
4 2
I've tried using the following query:
select Day as days, sum(case when Day <= days then 1 else 0 end)
from df
But it gives incorrect output (only counts users who were active on each specific days).
I'm at a loss as to what I could try next. Does anyone have any ideas? Many thanks in advance!
Upvotes: 1
Views: 2267
Reputation: 1270713
I think I would just use generate_series()
:
select gs.d, count(*)
from (select id, min(day) as min_day, max(day) as max_day
from t
group by id
) t cross join lateral
generate_series(t.min_day, .max_day, 1) gs(d)
group by gs.d
order by gs.d;
If you want to count everyone as active from day 1 -- but not all have a value on day 1 -- then use 1
instead of min_day
.
Here is a db<>fiddle.
Upvotes: 1
Reputation: 1334
A bit verbose, but this should do:
with dt as (
select 0 d, 1 id
union all
select 0 d, 2 id
union all
select 0 d, 3 id
union all
select 0 d, 4 id
union all
select 0 d, 5 id
union all
select 1 d, 1 id
union all
select 1 d, 2 id
union all
select 2 d, 1 id
union all
select 3 d, 1 id
union all
select 4 d, 1 id
union all
select 4 d, 2 id
)
, active_periods as (
select id
, min(d) min_d
, max(d) max_d
from dt
group by id
)
, days as (
select distinct d
from dt
)
select d.d
, count(ap.id)
from days d
join active_periods ap on d.d between ap.min_d and ap.max_d
group by 1
order by 1 asc
Upvotes: 0