Reputation: 70466
I am playing around with bigquery. I have IoT uptime recordings as input:
+---------------+-------------+----------+------------+
| device_id | reference | uptime | timestamp |
+---------------+-------------+----------+------------+
| 1 | 1000-5 | 0.7 | 2019-02-12 |
| 2 | 1000-6 | 0.9 | 2019-02-12 |
| 1 | 1000-5 | 0.8 | 2019-02-11 |
| 2 | 1000-6 | 0.95 | 2019-02-11 |
+---------------+-------------+----------+------------+
I want to calculate the 7, 14 and 30 day moving average of the uptime grouped by device. The output should look as follows:
+---------------+-------------+---------+--------+--------+
| device_id | reference | avg_7 | avg_14 | avg_30 |
+---------------+-------------+---------+--------+--------+
| 1 | 1000-5 | 0.7 | .. | .. |
| 2 | 1000-6 | 0.9 | .. | .. |
+---------------+-------------+---------+--------+--------+
What I have tried:
SELECT
device_id,
AVG(uptime) OVER (ORDER BY day RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7d
FROM (
SELECT device_id, uptime, UNIX_DATE(DATE(timestamp)) as day FROM `uptime_recordings`
)
GROUP BY device_id, uptime, day
I have recordings for 1000 distinct devices and 200k readings. The grouping does not work and the query returns 200k records instead of 1000. Any ideas whats wrong?
Upvotes: 1
Views: 3682
Reputation: 59325
I have recordings for 1000 distinct devices and 200k readings. The grouping does not work and the query returns 200k records instead of 1000. Any ideas whats wrong?
Instead of GROUP BY device_id, uptime, day
do GROUP BY device_id, day
.
A full working query:
WITH data
AS (
SELECT title device_id, views uptime, datehour timestamp
FROM `fh-bigquery.wikipedia_v3.pageviews_2019`
WHERE DATE(datehour) BETWEEN '2019-01-01' AND '2019-01-09'
AND wiki='br'
AND title='Chile'
)
SELECT device_id, day
, AVG(uptime) OVER (PARTITION BY device_id ORDER BY UNIX_DATE(day) RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7d
FROM (
SELECT device_id, AVG(uptime) uptime, (DATE(timestamp)) as day
FROM `data`
GROUP BY device_id, day
)
Edit: As per requested in the comments, not sure what's the goal of summarizing all of the 7d averages:
WITH data
AS (
SELECT title device_id, views uptime, datehour timestamp
FROM `fh-bigquery.wikipedia_v3.pageviews_2019`
WHERE DATE(datehour) BETWEEN '2019-01-01' AND '2019-01-09'
AND wiki='br'
AND title IN ('Chile', 'Saozneg')
)
SELECT device_id, AVG(avg_7d) avg_avg_7d
FROM (
SELECT device_id, day
, AVG(uptime) OVER (PARTITION BY device_id ORDER BY UNIX_DATE(day) RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7d
FROM (
SELECT device_id, AVG(uptime) uptime, (DATE(timestamp)) as day
FROM `data`
GROUP BY device_id, day
)
)
GROUP BY device_id
Upvotes: 3