Hippocrates
Hippocrates

Reputation: 2540

Sampling time series data with BigQuery SQL

I have some time series data in BigQuery that looks like this

timestamp price
2018-08-20 04:01:00 5
2018-08-20 04:04:00 6
2018-08-20 04:05:00 5
2018-08-20 04:06:00 5
2018-08-20 04:10:00 5
2018-08-20 04:22:00 6
2018-08-20 04:30:00 6
2018-08-20 04:59:00 7

Optimally I would have a data point exactly every 1 minute but that is not the case. I will sometimes have fewer data points, duplicates, or too few/many clustered around one time.

I'd like to select some data and chart it using a resolution of, say, 4 samples per hour, ideally at 0, 15, 30, 45 minutes, or as close to that as possible. I want to take at most 4 samples per hour, and have them somewhat evenly spaced, so it may end up that taking 3 samples at minutes 2, 23, 43 makes more sense.

In other words, select the times that get me the closest to one every 15 minutes.

Is there an elegant way to do something like this in BQ sql?

Upvotes: 1

Views: 423

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270773

You can (relatively) easily select the first in each 15 minutes interval:

select t.* (except seqnum)
from (select t.*,
             row_number() over (partition by floor(timestamp_seconds(timestamp) / (60 * 15)) order by timestamp) as seqnum
      from t
     ) t
where seqnum = 1;

You can also do this using qualify:

select t.*
from t
qualify row_number() over (partition by floor(timestamp_seconds(timestamp) / (60 * 15)) order by timestamp) = 1;

The idea is to convert the timestamp to a number of seconds. Then divide this by the 60*15 -- the number of seconds in a 15 minute interval.

Upvotes: 3

Related Questions