Reputation: 2451
Edit I've updated the examples to reflect the logic, before they were used to reflect data structure only.
Having data like this:
id timestamp
3 2022-10-01 12:45:47 UTC
3 2022-10-01 12:45:27 UTC
3 2022-10-01 12:45:17 UTC
1 2022-09-29 15:26:40 UTC
2 2022-09-29 13:15:38 UTC
1 2022-09-29 12:08:28 UTC
2 2022-09-26 16:17:15 UTC
(Basically, every id can have a lot of timestamps throughout a single day over months of time.)
I would like to have have average time between each two timestamps within a week, so something like:
id week averageTimeSec
3 2022-09-26 15 (2022-10-01 12:45:27 - 2022-10-01 12:45:17 = 10 sec, 2022-10-01 12:45:47 - 2022-10-01 12:45:27 = 20 sec)
2 2022-09-26 248303 (2022-09-29T13:15:38Z - 2022-09-26T16:17:15Z = 248303 sec)
1 2022-09-26 11892 (2022-09-29 15:26:40 - 2022-09-29 12:08:28 = 11892 sec)
The idea is to see the frequency of these events generated over a long period of time. Say, two months ago some ID would generate an event on average every 100 seconds, one month ago - 50 seconds, and so on.
I am normally not working with BigQuery or SQL, and facing a task like this tripped me over. I can imagine an approach I would use to solve the problem using InfluxDB's Flux, but it looks like that knowledge helps me none when it comes to BigQuery... I've started reading BigQuery documentation and so far, I don't have a proper idea on how to achieve the desired result. If someone could point me in the right direction, I would greatly appreciate it.
What I have managed to achieve was something like this (just to see the average amount of events between IDs within each week and not the actual time in-between the events):
SELECT week, AVG(connectionCount)
FROM (SELECT id,
TIMESTAMP_TRUNC(timestamp, WEEK) week,
COUNT(timestamp) connectionCount
FROM `allEvents`
GROUP BY id, week
ORDER BY id, week)
GROUP BY week
ORDER BY week
Thanks to https://stackoverflow.com/users/356506/daryl-wenman-bateson I think I have exactly what I needed. Here is a full example with data:
WITH testData AS (
SELECT '3' as id, TIMESTAMP('2022-10-01 12:45:47 UTC') as timestamp UNION ALL
SELECT '3', TIMESTAMP('2022-10-01 12:45:27 UTC') UNION ALL
SELECT '3', TIMESTAMP('2022-10-01 12:45:17 UTC') UNION ALL
SELECT '1', TIMESTAMP('2022-09-29 15:26:40 UTC') UNION ALL
SELECT '2', TIMESTAMP('2022-09-29 13:15:38 UTC') UNION ALL
SELECT '1', TIMESTAMP('2022-09-29 12:08:28 UTC') UNION ALL
SELECT '2', TIMESTAMP('2022-09-26 16:17:15 UTC')
)
SELECT
id,
week,
AVG(timeDifference) diff
FROM
(
SELECT
id,
timestamp,
TIMESTAMP_TRUNC(timestamp, ISOWEEK) week,
UNIX_SECONDS(timestamp) - LAG(UNIX_SECONDS(timestamp)) OVER (PARTITION BY id ORDER BY timestamp) AS timeDifference
FROM
testData
)
GROUP BY
id,
week
ORDER BY
diff
Upvotes: 1
Views: 620
Reputation: 3954
Use LAG to identify to identify the last time, partitioned by date and ordered by the timestamp column and then average the result.
SELECT id,
week,
avg(timedifference) diff
FROM
(
SELECT id,
event,
TIMESTAMP_TRUNC(event, WEEK)week,
UNIX_SECONDS(event) - LAG(UNIX_SECONDS(event) ) OVER (PARTITION BY id ORDER BY event) AS timedifference
FROM `DataSet.TimeDif`
)
GROUP BY id,
week
The following result is returned for your data
(note: numbers are large because your events are several days apart and shown in seconds)
Upvotes: 1