Reputation: 3754
Let's assume that we have a table in BigQuery with the following data:
time | user | status
0:00 | 1 | start <-- user 1 is online
3:00 | 1 | stop <-- user 1 is disconnected
3:10 | 1 | stop <-- a user can have multiple stops
3:20 | 1 | stop <-- a user can have multiple stops
3:15 | 2 | start <-- user 2 connected
4:00 | 2 | stop <-- user 2 disconnected
5:00 | 1 | start <-- user 1 is online again
5:10 | 2 | stop <-- user 2 still offline
9:00 | 1 | start <-- user 1 connected
10:00 | 1 | stop <-- user 1 disconnected
TIMESTAMP
and user and status are STRING
. I want to know with a query which users were offline at 08:00 and how much time they were offline (elapsed time
from 00:08 to the first STOP after a START).
So the result might be:
user | status | time (sec)
1 | start | 180 <-- 3 minutes online since the 05:00 start
2 | stop | 170 <-- 4 minutes offline since the first 04:00 stop
I have tried with aggregation, last_value over, lag over but I can only get the previous event.
Upvotes: 0
Views: 55
Reputation: 75735
It's a cool challenge! Here what I propose for the user connected at a specific date
WITH
#Select all the previous state for each user
select_previous AS (
SELECT
time,
user,
state,
LAG(state) OVER (PARTITION BY user ORDER BY time) AS previous_state
FROM
`gbl-imt-homerider-basguillaueb.bqml_test.user_up_sof`),
#Keep only the state change date
state_change AS (
SELECT
*
FROM
select_previous
WHERE
previous_state != state
OR previous_state IS NULL ),
#Select the latest time registered before the wanted timeline
max_time AS(
SELECT
MAX(time) AS time,
user
FROM
state_change
WHERE
time < TIMESTAMP("2019-08-01 00:08:00")
GROUP BY
user)
#Join the state change list with the latest time registered per user. Keep only the state stop
SELECT
state_change.*
FROM
max_time
JOIN
state_change
ON
max_time.time = state_change.time
AND max_time.user = state_change.user
WHERE
state = "stop"
There is maybe better, but it works.
Same thing for the offline time. Some boundaries has to be added, but the core of the resquest is here:
WITH
#Select all the previous state for each user
select_previous AS (
SELECT
time,
user,
state,
LAG(state) OVER (PARTITION BY user ORDER BY time) AS previous_state
FROM
`gbl-imt-homerider-basguillaueb.bqml_test.user_up_sof`),
#Keep only the state change date
state_change AS (
SELECT
*
FROM
select_previous
WHERE
previous_state != state
OR previous_state IS NULL ),
#get the previous date change
time_previous_change AS (
SELECT
*,
LAG(time) OVER (PARTITION BY user ORDER BY time) AS previous_time
FROM
state_change )
#perform a sum of the timeDiff, only when the state is start and the previous is stop (offline time)
SELECT
user,
SUM(timestamp_diff (time,
previous_time,
second)) AS elapsed_in_second
FROM
time_previous_change
WHERE
state="start"
AND previous_state="stop"
GROUP BY
user
Not sure to understand the example that you provide. I can tune more these queries if you need.
Upvotes: 1