DarkLeafyGreen
DarkLeafyGreen

Reputation: 70466

Calculate 7, 14 and 30 day moving average in bigquery

I am playing around with bigquery. I have IoT uptime recordings as input:

+---------------+-------------+----------+------------+
|   device_id   |  reference  |  uptime  | timestamp  |
+---------------+-------------+----------+------------+
| 1             | 1000-5      |  0.7     | 2019-02-12 |
| 2             | 1000-6      |  0.9     | 2019-02-12 |
| 1             | 1000-5      |  0.8     | 2019-02-11 |
| 2             | 1000-6      |  0.95    | 2019-02-11 |
+---------------+-------------+----------+------------+

I want to calculate the 7, 14 and 30 day moving average of the uptime grouped by device. The output should look as follows:

+---------------+-------------+---------+--------+--------+
|   device_id   |  reference  |  avg_7  | avg_14 | avg_30 |
+---------------+-------------+---------+--------+--------+
| 1             | 1000-5      |  0.7    | ..     | ..     |
| 2             | 1000-6      |  0.9    | ..     | ..     |
+---------------+-------------+---------+--------+--------+

What I have tried:

SELECT
    device_id,
    AVG(uptime) OVER (ORDER BY day RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7d
FROM (
  SELECT device_id, uptime, UNIX_DATE(DATE(timestamp)) as day FROM `uptime_recordings`
)
GROUP BY device_id, uptime, day

I have recordings for 1000 distinct devices and 200k readings. The grouping does not work and the query returns 200k records instead of 1000. Any ideas whats wrong?

Upvotes: 1

Views: 3682

Answers (1)

Felipe Hoffa
Felipe Hoffa

Reputation: 59325

I have recordings for 1000 distinct devices and 200k readings. The grouping does not work and the query returns 200k records instead of 1000. Any ideas whats wrong?

Instead of GROUP BY device_id, uptime, day do GROUP BY device_id, day.

A full working query:

WITH data 
AS (
  SELECT title device_id, views uptime, datehour timestamp
  FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
  WHERE DATE(datehour) BETWEEN '2019-01-01' AND '2019-01-09'
  AND wiki='br'
  AND title='Chile'
)

SELECT device_id, day
  , AVG(uptime) OVER (PARTITION BY device_id ORDER BY UNIX_DATE(day) RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7d
FROM (
  SELECT device_id, AVG(uptime) uptime, (DATE(timestamp)) as day
  FROM `data`
  GROUP BY device_id, day
)

enter image description here

Edit: As per requested in the comments, not sure what's the goal of summarizing all of the 7d averages:

WITH data 
AS (
  SELECT title device_id, views uptime, datehour timestamp
  FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
  WHERE DATE(datehour) BETWEEN '2019-01-01' AND '2019-01-09'
  AND wiki='br'
  AND title IN ('Chile', 'Saozneg')
)

SELECT device_id, AVG(avg_7d) avg_avg_7d
FROM (
  SELECT device_id, day
    , AVG(uptime) OVER (PARTITION BY device_id ORDER BY UNIX_DATE(day) RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7d
  FROM (
    SELECT device_id, AVG(uptime) uptime, (DATE(timestamp)) as day
    FROM `data`
    GROUP BY device_id, day
  )
)
GROUP BY device_id 

enter image description here

Upvotes: 3

Related Questions