Use NTILE function over a rolling window in BigQuery (rolling binning / rolling quantile)

Question

I have a simple table with Time column (1 minute frequency) and another column of float value, let's call it column A. My goal is to create a query in BigQuery that does the following:

For each row, look at the window of the preceding 24 hours and the following 24 hours, and get the relevant "bucket number" (as in NTILE(5) func) for the specific row which is the center of the window.

I'm trying to achieve a "rolling binning" - kind of similar to "rolling quantile" (as in pandas Rolling Quantile) but with the bin number. I can do it quite simply in python with pd.DataFrame.rolling.apply and a custom function for binning and return a value, but I want to do it using BigQuery.

I'm having trouble combining both NTILE function and a rolling window.

I was able to create a rolling window (referred to as Analytic function in the docs) and perform a simple aggregation such as average by using

SELECT Time, AVG(A)
OVER (ORDER BY UNIX_SECONDS(Time) ASC RANGE BETWEEN 86400 PRECEDING AND 86400 FOLLOWING) as rolling_avg_24_hours_each_side
FROM `my_project.my_table`

(86400 comes from 3600 seconds in a hour * 24 hours. Is there an alternative to the use of magic numbers?)

using NTILE(5) function over my entire table is quite easy, but I can't use it in my rolling window.

As the docs of Numbering functions states, using a window_frame_clause is not allowed, therefore I can't use the RANGE... statement combined with NTILE.

So my main question is how to combine NTILE function with a rolling window, or are there other solutions to achieve "rolling binning" with a rolling window?

Gordon Linoff · Accepted Answer

There is no simple way to do this in BigQuery as far as I know. The following method is not simple but it might work in your case. Start by bringing together the rows that you need for each window:

select t.time, t.a, array_agg(t2.a) as window_a
from `my_project.my_table` t join
     `my_project.my_table` t2
     on t2.time >= date_add(t.time, interval -24 hour) and
        t2.time < date_add(t.time, interval 24 hour)
group by t.time, t.a;

Then calculate the the tile manually:

with t as (
      select t.time, t.a, array_agg(t2.a) as window_a
      from `my_project.my_table` t join
           `my_project.my_table` t2
           on t2.time >= datetime_add(t.time, interval -24 hour) and
              t2.time < datetime_add(t.time, interval 24 hour)
      group by t.time, t.a
     )
select t.*,
       (select ceiling(min(case when t.a = a2 then seqnum end) * 5.0 / max(cnt))
        from (select a2,
                     row_number() over (order by a2) as seqnum,
                     count(*) over () as cnt
              from unnest(t.window_a) a2
             ) w
       ) as rolling_tile
from t;

Use NTILE function over a rolling window in BigQuery (rolling binning / rolling quantile)

Answers (2)

Related Questions