marchelbling
marchelbling

Reputation: 1939

Sliding window with fixed duration in SQL/BigQuery

I have a time series (userid, timestamp) and I'd like to determine sessions.

A session is defined at the user level by a fixed window of 5min which starts for each new row that is not in a previous session (the window duration would ideally depend on the number of previous rows for a given user but for now I can live with a fixed window duration) e.g.

WITH sample AS (
  SELECT 0 user_id, TIMESTAMP('2020-01-01T00:00:00Z') timestamp UNION ALL -- new session
  SELECT 1, TIMESTAMP('2020-01-01T00:00:00Z') UNION ALL -- new session
  SELECT 1, TIMESTAMP('2020-01-01T00:04:00Z') UNION ALL
  SELECT 1, TIMESTAMP('2020-01-01T00:06:00Z') UNION ALL -- new session
  SELECT 1, TIMESTAMP('2020-01-01T00:10:00Z') UNION ALL
  SELECT 1, TIMESTAMP('2020-01-01T00:11:00Z') UNION ALL -- new session
  SELECT 1, TIMESTAMP('2020-01-01T01:00:00Z')           -- new session
)
SELECT * 
FROM sample;

I'm stuck since, in my reasoning, determining whether a row starts a new session depends on the "new session" column value from previous rows.

I'm targeting BigQuery so ideally it should be doable in BQ syntax.

Thanks for any help/hint!

Upvotes: 0

Views: 638

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270021

I think you want lag() and a cumulative sum:

select s.*,
       sum(case when prev_ts > timestamp_add(timestamp, interval -5 minute)
                then 0 else 1  -- "1" starts a new session
           end) over (partition by user_id order by timestamp) as session_num
from (select s.*,
             lag(timestamp) over (partition by user_id order by timestamp) as prev_ts
      from sample s
     ) s

Upvotes: 2

Related Questions