Patrick Plaatje
Patrick Plaatje

Reputation: 109

Rolling 31 day average including previous 31 days from BigQuery

I've created a query that returns a counts the number of rows (records) for the last 31 days (based on a timestamp field) and include the previous 31 days before that period as well, eg. produce a query that returns both. I now have the following query:

SELECT
  COUNT(*) OVER(ORDER BY datetime DESC RANGE BETWEEN 2678400000 PRECEDING AND CURRENT ROW) AS rolling_avg_31_days,
  COUNT(*) OVER(ORDER BY datetime DESC RANGE BETWEEN 5356800000 PRECEDING AND CURRENT ROW) AS rolling_avg_62_days
FROM `p`
ORDER BY rolling_avg_31_days DESC LIMIT 1

And it returns some data, but not really the data I was hoping for:

rolling_avg_31_days | rolling_avg_62_days   
          8,422,783 | 9,790,304

If I query the same table with (rolling 62 days):

SELECT COUNT(*)  FROM `p`
WHERE datetime > UNIX_MILLIS(CURRENT_TIMESTAMP)-5356800000 AND datetime < UNIX_MILLIS(CURRENT_TIMESTAMP)-2678400000'

I get a value of 6,192,920

I'm not sure what I'm doing wrong. Any help is much appreciated!

Upvotes: 2

Views: 114

Answers (2)

Patrick Plaatje
Patrick Plaatje

Reputation: 109

Instead of going with the above, I've decided to change the query to be a bit simpler:

SELECT
(SELECT COUNT(DISTINCT(wasabi_user_id)) FROM `p` WHERE datetime > UNIX_MILLIS(CURRENT_TIMESTAMP)-5356800000 AND datetime < UNIX_MILLIS(CURRENT_TIMESTAMP)-2678400000) as _62days,
(SELECT COUNT(DISTINCT(wasabi_user_id)) FROM `p` WHERE datetime > UNIX_MILLIS(CURRENT_TIMESTAMP)-2678400000) AS _31days
FROM `mycujoo_kafka_public.v_web_event_pageviews` LIMIT 1

Thanks @Mikhail for the help though!

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173046

So, the first query is correct and gives you rolling counts (31 and 62 days) based on the timestamp field - also, because of order by .. desc and limit 1 you are getting the most row that has biggest rolling_avg_31_days which is not necessarily row for the most recent () datetime

The second query just produces count of rows between 62 and 31 days based on the current timestamp - which is as explain above is not what first query produces - thus the discrepancy

To further troubleshoot or to try to understand difference - change ORDER BY rolling_avg_31_days DESC LIMIT 1 to ORDER BY datetime DESC LIMIT 1 and also add datetime to select statement so you can see if it belong to current date or close to current statement so results are comparable

Upvotes: 1

Related Questions