Kristoph Matthews
Kristoph Matthews

Reputation: 359

SQL: How to construct a time series from irregular data and then subsequently calculate a rolling average over it

I am trying to calculate a rolling average of data from incident reports. The exact quantity I'm looking for is the 30-day-mean-time-to-resolution (mttr) which means the average of the time it takes to resolve incidents in the last 30 days.

My incidents table looks something like this:

| incident_id | start_datetime        | end_datetime          |
|-------------|-----------------------|-----------------------|
| 1           | '2020-02-01T10:13:00' | '2020-02-01T10:59:33' |
| 2           | '2020-02-01T17:55:13' | '2020-02-02T00:35:28' |
| 3           | '2020-02-03T13:33:01' | '2020-02-03T15:54:01' |

What I want is something like this (the numbers are made up so don't try to actually calculate-- just note that the datetime intervals are every hour):

| datetime              | mttr_last30days_in_hours |
|-----------------------|--------------------------|
| '2020-02-01T10:00:00' | 5.7                      |
| '2020-02-01T11:00:00' | 5.6                      |
| '2020-02-02T12:00:00' | 5.8                      |

I can calculate the mttr in the last 30 days really easily if I'm doing it just for one point in time:

SELECT avg(end_datetime - start_datetime) mttr_last30days_in_hours
FROM incidents
WHERE datetime_diff(current_datetime(), start_datetime, DAY) <= 30 

The problem is this just gives me ONE number. How do I create a time series spanning the range of say, the first incident's start_datetime (min(start_datetime)) to the current time, and then get a rolling 30 day average with evenly spaced, hourly intervals (like the example table above)?

Upvotes: 0

Views: 221

Answers (1)

rmesteves
rmesteves

Reputation: 4085

If you have an unique field in your table, you can try doing that:

WITH

t_filter AS(
  SELECT 
     *
  FROM
    incidents 
  WHERE datetime_diff(current_datetime(), start_datetime, DAY) <= 30 
),

t_dates AS (
  SELECT
    unique_key,
    GENERATE_DATE_ARRAY(DATE(start_datetime), CURRENT_DATE(), INTERVAL 1 DAY) AS date_array
  FROM
    t_filter
),

t_hour AS (
  SELECT *
  FROM
  UNNEST(["00:00:00",
    "01:00:00",
    "02:00:00",
    "03:00:00",
    "04:00:00",
    "05:00:00",
    "06:00:00",
    "07:00:00",
    "08:00:00",
    "09:00:00",
    "10:00:00",
    "11:00:00",
    "12:00:00",
    "13:00:00",
    "14:00:00",
    "15:00:00",
    "16:00:00",
    "17:00:00",
    "18:00:00",
    "19:00:00",
    "20:00:00",
    "21:00:00",
    "22:00:00",
    "23:00:00"]) h 
),

sequence AS(
  SELECT
    unique_key,
    CONCAT(CAST(arr AS string),"T", h) date_hour
  FROM
    t_dates,
    UNNEST(date_array) arr,
    t_hour
)

SELECT
  date_hour,
  AVG(end_datetime - start_datetime)
FROM
  sequence
LEFT JOIN
  t_filter
ON
incidents.unique_key = sequence.unique_key
GROUP BY
  date_hour

I hope it helps

Upvotes: 1

Related Questions