Wiliam
Wiliam

Reputation: 3754

BigQuery: get offline users at a point in time and the time they have been offline

Let's assume that we have a table in BigQuery with the following data:

time  | user | status
 0:00 | 1    | start    <-- user 1 is online
 3:00 | 1    | stop     <-- user 1 is disconnected
 3:10 | 1    | stop     <-- a user can have multiple stops
 3:20 | 1    | stop     <-- a user can have multiple stops
 3:15 | 2    | start    <-- user 2 connected
 4:00 | 2    | stop     <-- user 2 disconnected
 5:00 | 1    | start    <-- user 1 is online again
 5:10 | 2    | stop     <-- user 2 still offline
 9:00 | 1    | start    <-- user 1 connected
10:00 | 1    | stop     <-- user 1 disconnected

I want to know with a query which users were offline at 08:00 and how much time they were offline (elapsed time from 00:08 to the first STOP after a START).

So the result might be:

user | status | time (sec)
1    | start  | 180          <-- 3 minutes online since the 05:00 start
2    | stop   | 170          <-- 4 minutes offline since the first 04:00 stop

I have tried with aggregation, last_value over, lag over but I can only get the previous event.

Upvotes: 0

Views: 55

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75735

It's a cool challenge! Here what I propose for the user connected at a specific date

    WITH
  #Select all the previous state for each user
  select_previous AS (
  SELECT
    time,
    user,
    state,
    LAG(state) OVER (PARTITION BY user ORDER BY time) AS previous_state
  FROM
    `gbl-imt-homerider-basguillaueb.bqml_test.user_up_sof`),
  #Keep only the state change date
  state_change AS (
  SELECT
    *
  FROM
    select_previous
  WHERE
    previous_state != state
    OR previous_state IS NULL ),
  #Select the latest time registered before the wanted timeline
  max_time AS(
  SELECT
    MAX(time) AS time,
    user
  FROM
    state_change
  WHERE
    time < TIMESTAMP("2019-08-01 00:08:00")
  GROUP BY
    user)
  #Join the state change list with the latest time registered per user. Keep only the state stop
SELECT
  state_change.*
FROM
  max_time
JOIN
  state_change
ON
  max_time.time = state_change.time
  AND max_time.user = state_change.user
WHERE
  state = "stop"

There is maybe better, but it works.

Same thing for the offline time. Some boundaries has to be added, but the core of the resquest is here:

WITH
  #Select all the previous state for each user
  select_previous AS (
  SELECT
    time,
    user,
    state,
    LAG(state) OVER (PARTITION BY user ORDER BY time) AS previous_state
  FROM
    `gbl-imt-homerider-basguillaueb.bqml_test.user_up_sof`),
  #Keep only the state change date
  state_change AS (
  SELECT
    *
  FROM
    select_previous
  WHERE
    previous_state != state
    OR previous_state IS NULL ),
  #get the previous date change
  time_previous_change AS (
  SELECT
    *,
    LAG(time) OVER (PARTITION BY user ORDER BY time) AS previous_time
  FROM
    state_change )
  #perform a sum of the timeDiff, only when the state is start and the previous is stop (offline time)
SELECT
  user,
  SUM(timestamp_diff (time,
      previous_time,
      second)) AS elapsed_in_second
FROM
  time_previous_change
WHERE
  state="start"
  AND previous_state="stop"
GROUP BY
  user

Not sure to understand the example that you provide. I can tune more these queries if you need.

Upvotes: 1

Related Questions