Reputation: 45
In Google bigquery database, Convert minute level timestamp to an interval of 5 minutes. The 5 minutes interval is standard time interval . Below is just an example how i want the data to be presented
test hd_count
2013-12-20 10:40:30 1
2013-12-20 10:41:30 3
2013-12-20 10:42:30 2
2013-12-20 10:43:30 1
2013-12-20 10:44:30 1
I want this to be represented as
test_1 test_2 hd_count
2013-12-20 10:40:30 2013-12-20 10:44:30 8
I have looked into similar request in other answer but none of them seem to work for big query.Any help would be appreciated
Upvotes: 0
Views: 2796
Reputation: 4384
Here's a standard SQL UDF-based approach that enables arbitrary alignments up to millisecond precision. I use it when I'm working with more granular time intervals:
CREATE TEMPORARY FUNCTION bracketTimestampByMillis(ts TIMESTAMP, bracketMillis INT64) RETURNS TIMESTAMP AS (
TIMESTAMP_MILLIS(CAST(FLOOR(
(UNIX_MILLIS(ts) - UNIX_MILLIS(TIMESTAMP_TRUNC(ts, DAY))) / bracketMillis) AS INT64)
* bracketMillis + UNIX_MILLIS(TIMESTAMP_TRUNC(ts, DAY))));
To demonstrate, here's another UDF which consumes the first to build an array of timestamps aligned to various intervals:
CREATE TEMPORARY FUNCTION emitTimeBrackets(ts TIMESTAMP) RETURNS ARRAY<STRUCT<bracket STRING, tsVal TIMESTAMP>> AS (
[STRUCT("exact" as bracket, ts as tsVal),
STRUCT("minute", bracketTimestampByMillis(ts, 60 * 1000)),
STRUCT("5 minute", bracketTimestampByMillis(ts, 5 * 60 * 1000)),
STRUCT("15 minute", bracketTimestampByMillis(ts, 15 * 60 * 1000)),
STRUCT("hour", bracketTimestampByMillis(ts, 60 * 60 * 1000)),
STRUCT("quarter day", bracketTimestampByMillis(ts, 6 * 3600 * 1000))
]
);
SELECT emitTimeBrackets(CURRENT_TIMESTAMP()) as b
Upvotes: 1
Reputation: 173028
Below is for BigQuery Standard SQL
#standardSQL
WITH minmax AS (
SELECT MIN(test) AS mintest, MAX(test) AS maxtest, 5 AS step
FROM `project.dataset.table`
), intervals AS (
SELECT
TIMESTAMP_ADD(mintest, INTERVAL step * num MINUTE) AS test1,
TIMESTAMP_ADD(mintest, INTERVAL step * 60* (1 + num) - 1 SECOND) AS test2
FROM minmax,
UNNEST(GENERATE_ARRAY(0, DIV(TIMESTAMP_DIFF(maxtest, mintest, MINUTE) , step))) AS num
)
SELECT test1, test2, SUM(hd_count) AS hd_count
FROM intervals JOIN `project.dataset.table`
ON test BETWEEN test1 AND test2
GROUP BY test1, test2
you can test / play with above using dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT TIMESTAMP '2013-12-20 10:40:30' test, 1 hd_count UNION ALL
SELECT TIMESTAMP '2013-12-20 10:41:30', 3 UNION ALL
SELECT TIMESTAMP '2013-12-20 10:42:30', 2 UNION ALL
SELECT TIMESTAMP '2013-12-20 10:43:30', 1 UNION ALL
SELECT TIMESTAMP '2013-12-20 10:44:30', 1 UNION ALL
SELECT TIMESTAMP '2013-12-20 10:45:30', 3 UNION ALL
SELECT TIMESTAMP '2013-12-20 10:46:30', 2 UNION ALL
SELECT TIMESTAMP '2013-12-20 10:47:30', 1
), minmax AS (
SELECT MIN(test) AS mintest, MAX(test) AS maxtest, 5 AS step
FROM `project.dataset.table`
), intervals AS (
SELECT
TIMESTAMP_ADD(mintest, INTERVAL step * num MINUTE) AS test1,
TIMESTAMP_ADD(mintest, INTERVAL step * 60* (1 + num) - 1 SECOND) AS test2
FROM minmax,
UNNEST(GENERATE_ARRAY(0, DIV(TIMESTAMP_DIFF(maxtest, mintest, MINUTE) , step))) AS num
)
SELECT test1, test2, SUM(hd_count) AS hd_count
FROM intervals JOIN `project.dataset.table`
ON test BETWEEN test1 AND test2
GROUP BY test1, test2
ORDER BY test1
output is as below
test1 test2 hd_count
2013-12-20 10:40:30 UTC 2013-12-20 10:45:29 UTC 8
2013-12-20 10:45:30 UTC 2013-12-20 10:50:29 UTC 6
Upvotes: 2