systemjack
systemjack

Reputation: 2985

Count sessions per minute derived from start and end timespans

I have a table with records for user activity that cover a span indicated by a start and end time. Am looking for a count of users active in the system per unit of time over the course of the previous day.

The maximum session length is an hour and they don't cross hour boundaries. Sessions can end and a new one begin in the same minute.

Here's a stripped down version of the query:

with minutes AS (
    -- ignore this...it generates a day's worth of timestamps for each minute
    -- it's hairy but is what I'm stuck with on redshift
    select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute
        from seed_table limit 1440
),
sessions as (
    select sid, ts_start, ts_end
    from user_sessions s
    where ts_end >= sysdate::date-'1 day'::interval 
        and ts_start < sysdate::date
)
select m.minute, count(distinct(s.sid))
from minutes m
left join sessions s on s.ts_end >= m.minute and s.ts_start < m.minute+'1 min'::interval
group by 1

I'm trying to avoid that nasty left join:

->  XN Nested Loop Left Join DS_BCAST_INNER  (cost=6913826151.95..4727012848741.55 rows=410434560 width=166)
    Join Filter: (("inner".ts_start < ("outer"."minute" + '00:01:00'::interval)) AND ("inner".ts_end >= "outer"."minute"))

Here's what almost worked for me based on Gordon Linoff's answer. It under counts when sessions for a user transition within a minute of each other. Seems like the right direction though. The original query could over count for the same reason but the opportunity to get the count of distinct session IDs for a minute solves that.

select minute, sum(count) over (order by minute rows unbounded preceding) as users
from (
    select minute, sum(count) as count
    from (
        (
            select date_trunc('minute', ts_start) as minute, count(*) as count
            from sessions
            group by 1
        ) union all (
            select date_trunc('minute', ts_end) as minute, - count(*) as count
            from sessions
            group by 1
        )
    ) s1
    group by minute
) s2
order by minute;

For comparison, here are the timing results for an hour's worth of data:

  1. Original Query Time: 81301.345 ms
  2. Sum-Over Query Time: 36242.342 ms

Upvotes: 0

Views: 1451

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269873

You can do this much faster by counting the number of starts and stops in each minute, and then taking the cumulative sum. The result is something like this:

select minute, sum(cnt) over (order by minute)
from ((select date_trunc('minute', ts_start) as minute, count(*) as cnt
       from sessions
       group by 1
      ) union all
      (select date_trunc('minute', ts_end), - count(*)
       from sessions
       group by 1
      )
     ) s
group by minute
order by minute;

Upvotes: 2

Related Questions