Kurt Maile
Kurt Maile

Reputation: 1267

Google BigQuery Standard SQL - Sales Sliding Window

I am wondering how best to do both a tumbling and sliding window in BigQuery standardSQL (not legacy).

Apache Spark makes this pretty easy - for a 'sliding' window of length 10 mins sliding every 5 mins, groupBy(window("10 minutes","5 minutes)) and for a tumbling window groupBy(window("10 minutes")).

Consider I have a simplified order row with:

orderId, 
orderPlacedTimestamp, 
orderTotals.grandTotalNet, 
orderTotals.grandTotalGross 

(note the struct on the totals)

and I need two total aggregate order sales value as:

  1. tumbling HOURLY window of sales
  2. sliding 10 MINUTE / 5 MINUTE window (as described above).

I started this in using a SQL over but got a bit messy with partitioning, timestamps and getting the right start and end windows. For each the result set should be:

windowStartTime, 
windowEndTime, 
windowTotalAmount

So the windows have a starting and ending time (which is crucial for sliding given the length of the window 10 is different to the sliding duration 5 so a given row can be in multiple windows as a result).

How do I do this in BigQuery?

UPDATE 11/06 WITH CURRENT SQL:

SELECT 
  TIMESTAMP_SECONDS(FIRST_VALUE(ts_5min*5*60) OVER(w)) as startWindowTime,
  TIMESTAMP_SECONDS(LAST_VALUE(ts_5min*5*60) OVER(w)) as endWindowTime,
  SUM(orderTotalNetConverted) OVER(ORDER BY ts_5min RANGE BETWEEN 1 PRECEDING AND CURRENT ROW) as windowSalesTotal
FROM (
  SELECT 
    CAST(UNIX_SECONDS(TIMESTAMP_TRUNC(orderPlacedTimestamp, MINUTE))/60/5 AS INT64)
      AS ts_5min,
    orderTotalNetConverted
  FROM orders
)
WINDOW w AS (ORDER BY ts_5min RANGE BETWEEN 1 PRECEDING AND CURRENT ROW)
ORDER BY startWindowTime desc

Upvotes: 1

Views: 1463

Answers (2)

Mosha Pasumansky
Mosha Pasumansky

Reputation: 14014

Tumbling window is easier to do, it is just a regular GROUP BY at hourly intervals:

SELECT 
  TIMESTAMP_TRUNC(orderPlacedTimestamp, HOUR), 
  SUM(orderTotals.grandTotalNet)
FROM T
GROUP BY 1

For sliding window, I would first normalize timestamps to 5 minute intervals using following:

  1. TIMESTAMP_TRUNC(orderPlacedTimestamp, MINUTE) to get to minute boundary
  2. UNIX_SECONDS to convert to seconds since epoch
  3. Divide by 60 to get minutes
  4. Divide by 5 to get 5 minute interval
  5. Round to integer:

CAST(UNIX_SECONDS(TIMESTAMP_TRUNC(orderPlacedTimestamp, MINUTE))/60/5 AS INT64)

Now you can use standard OVER() clause to get 10 minute window, which means 2 such intervals at a time, and in order to get start time, use FIRST_VALUE analytic function:

SELECT
  orderId,
  TIMESTAMP_SECONDS(FIRST_VALUE(ts_5min*5*60) OVER(w)) startWindowTime,
  TIMESTAMP_ADD(TIMESTAMP_SECONDS(FIRST_VALUE(ts_5min*5*60) OVER(w)),
                INTERVAL 10 MINUTE) endWindowTime,
  SUM(grandTotalNet) OVER(w)
FROM (
  SELECT 
  *,
  CAST(UNIX_SECONDS(TIMESTAMP_TRUNC(orderPlacedTimestamp, MINUTE))/60/5 AS INT64) 
    AS ts_5min
  FROM t
)
WINDOW w AS (ORDER BY ts_5min RANGE BETWEEN 1 PRECEDING AND CURRENT ROW)

Upvotes: 2

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173210

Below is for sliding 10 MINUTE / 5 MINUTE window

#standardSQL
WITH starts AS (
  SELECT TIMESTAMP_ADD(TIMESTAMP_TRUNC(first, HOUR), INTERVAL step MINUTE) AS start
  FROM 
    (SELECT MIN(orderPlacedTimestamp) AS first, MAX(orderPlacedTimestamp) AS last FROM YourTable), 
    UNNEST(GENERATE_ARRAY(0, TIMESTAMP_DIFF(last, TIMESTAMP_TRUNC(first, HOUR), MINUTE) , 5)) AS step
)
SELECT 
  start, 
  SUM(orderTotals.grandTotalNet) AS net, 
  SUM(orderTotals.grandTotalGross) AS gross
FROM starts AS s JOIN YourTable AS t
ON t.orderPlacedTimestamp BETWEEN s.start AND TIMESTAMP_ADD(start, INTERVAL 10 MINUTE)
GROUP BY start
ORDER BY start 

As Mosha mentioned in his answer - tumbling HOURLY window is easy - but if you will need different window side - above approach is more flexible (I think) and easy to adjust for any window size, type by changing respective 5 and 10 in query code
But overall - wanted to mention - using analytic function is more optimal than joining

Upvotes: 1

Related Questions