Reputation: 2985
I have a table with records for user activity that cover a span indicated by a start and end time. Am looking for a count of users active in the system per unit of time over the course of the previous day.
The maximum session length is an hour and they don't cross hour boundaries. Sessions can end and a new one begin in the same minute.
Here's a stripped down version of the query:
with minutes AS (
-- ignore this...it generates a day's worth of timestamps for each minute
-- it's hairy but is what I'm stuck with on redshift
select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute
from seed_table limit 1440
),
sessions as (
select sid, ts_start, ts_end
from user_sessions s
where ts_end >= sysdate::date-'1 day'::interval
and ts_start < sysdate::date
)
select m.minute, count(distinct(s.sid))
from minutes m
left join sessions s on s.ts_end >= m.minute and s.ts_start < m.minute+'1 min'::interval
group by 1
I'm trying to avoid that nasty left join:
-> XN Nested Loop Left Join DS_BCAST_INNER (cost=6913826151.95..4727012848741.55 rows=410434560 width=166)
Join Filter: (("inner".ts_start < ("outer"."minute" + '00:01:00'::interval)) AND ("inner".ts_end >= "outer"."minute"))
Here's what almost worked for me based on Gordon Linoff's answer. It under counts when sessions for a user transition within a minute of each other. Seems like the right direction though. The original query could over count for the same reason but the opportunity to get the count of distinct session IDs for a minute solves that.
select minute, sum(count) over (order by minute rows unbounded preceding) as users
from (
select minute, sum(count) as count
from (
(
select date_trunc('minute', ts_start) as minute, count(*) as count
from sessions
group by 1
) union all (
select date_trunc('minute', ts_end) as minute, - count(*) as count
from sessions
group by 1
)
) s1
group by minute
) s2
order by minute;
For comparison, here are the timing results for an hour's worth of data:
Upvotes: 0
Views: 1451
Reputation: 1269873
You can do this much faster by counting the number of starts and stops in each minute, and then taking the cumulative sum. The result is something like this:
select minute, sum(cnt) over (order by minute)
from ((select date_trunc('minute', ts_start) as minute, count(*) as cnt
from sessions
group by 1
) union all
(select date_trunc('minute', ts_end), - count(*)
from sessions
group by 1
)
) s
group by minute
order by minute;
Upvotes: 2