NeverwinterMoon
NeverwinterMoon

Reputation: 2451

How to calculate average time between events for each week grouped by ID using BigQuery

Edit I've updated the examples to reflect the logic, before they were used to reflect data structure only.

Having data like this:

id    timestamp

3     2022-10-01 12:45:47 UTC
3     2022-10-01 12:45:27 UTC
3     2022-10-01 12:45:17 UTC
1     2022-09-29 15:26:40 UTC
2     2022-09-29 13:15:38 UTC
1     2022-09-29 12:08:28 UTC
2     2022-09-26 16:17:15 UTC

(Basically, every id can have a lot of timestamps throughout a single day over months of time.)

I would like to have have average time between each two timestamps within a week, so something like:

id   week          averageTimeSec

3    2022-09-26    15 (2022-10-01 12:45:27 - 2022-10-01 12:45:17 = 10 sec, 2022-10-01 12:45:47 - 2022-10-01 12:45:27 = 20 sec)

2    2022-09-26    248303 (2022-09-29T13:15:38Z - 2022-09-26T16:17:15Z = 248303 sec)

1    2022-09-26    11892 (2022-09-29 15:26:40 - 2022-09-29 12:08:28 = 11892 sec)

The idea is to see the frequency of these events generated over a long period of time. Say, two months ago some ID would generate an event on average every 100 seconds, one month ago - 50 seconds, and so on.

I am normally not working with BigQuery or SQL, and facing a task like this tripped me over. I can imagine an approach I would use to solve the problem using InfluxDB's Flux, but it looks like that knowledge helps me none when it comes to BigQuery... I've started reading BigQuery documentation and so far, I don't have a proper idea on how to achieve the desired result. If someone could point me in the right direction, I would greatly appreciate it.

What I have managed to achieve was something like this (just to see the average amount of events between IDs within each week and not the actual time in-between the events):

SELECT week, AVG(connectionCount)
FROM (SELECT id,
             TIMESTAMP_TRUNC(timestamp, WEEK) week,
             COUNT(timestamp)                 connectionCount
      FROM `allEvents`

      GROUP BY id, week

      ORDER BY id, week)

GROUP BY week

ORDER BY week

Solution

Thanks to https://stackoverflow.com/users/356506/daryl-wenman-bateson I think I have exactly what I needed. Here is a full example with data:

WITH testData AS (
  SELECT '3' as id, TIMESTAMP('2022-10-01 12:45:47 UTC') as timestamp UNION ALL
  SELECT '3', TIMESTAMP('2022-10-01 12:45:27 UTC') UNION ALL
  SELECT '3', TIMESTAMP('2022-10-01 12:45:17 UTC') UNION ALL
  SELECT '1', TIMESTAMP('2022-09-29 15:26:40 UTC') UNION ALL
  SELECT '2', TIMESTAMP('2022-09-29 13:15:38 UTC') UNION ALL
  SELECT '1', TIMESTAMP('2022-09-29 12:08:28 UTC') UNION ALL
  SELECT '2', TIMESTAMP('2022-09-26 16:17:15 UTC')
)

  SELECT
    id, 
    week,
    AVG(timeDifference) diff
  FROM 
    (
      SELECT  
        id, 
        timestamp,
        TIMESTAMP_TRUNC(timestamp, ISOWEEK) week, 
        UNIX_SECONDS(timestamp) - LAG(UNIX_SECONDS(timestamp)) OVER (PARTITION BY id ORDER BY timestamp) AS timeDifference 
      FROM
        testData
    ) 
  GROUP BY 
    id, 
    week
  
  ORDER BY
    diff

Here is the output: enter image description here

Upvotes: 1

Views: 620

Answers (1)

Daryl Wenman-Bateson
Daryl Wenman-Bateson

Reputation: 3954

Use LAG to identify to identify the last time, partitioned by date and ordered by the timestamp column and then average the result.

SELECT  id, 
        week,
        avg(timedifference)  diff
FROM 
(
  SELECT  id, 
          event,
          TIMESTAMP_TRUNC(event, WEEK)week, 
          UNIX_SECONDS(event) - LAG(UNIX_SECONDS(event) ) OVER (PARTITION BY id ORDER BY event) AS timedifference 
  FROM    `DataSet.TimeDif` 
) 
GROUP BY id, 
        week

The following result is returned for your data

(note: numbers are large because your events are several days apart and shown in seconds)

Average seconds difference between two events with same id

Upvotes: 1

Related Questions