Daniel Earp
Daniel Earp

Reputation: 9

How do I calculate a SUM over a specific window in BigQuery SQL?

I have a dataset in BigQuery with a few float columns, let's call them amount_1, amount_2, etc. I also have a day column, it's a timestamp in the day format, for each day, there is only 1 row, meaning the data is already grouped.

For each day of the month, I want to calculate the SUM(amount_1), considering the last day of the previous month, and going back 120 days, meaning that, in the example of the day = '2024-04-05 00:00:00 UTC', I want to calculate the sum(amount_1) between the dates '2023-12-03' and '2024-03-31', this means that, for every day within a given month, the sum should be the same.

I tried this, but it didn't work:

with LastDayPreviousMonth AS (
  SELECT
    day,
    LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
  FROM
    main_table
)
select
day
,  SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
FROM
main_table as M
INNER JOIN LastDayPreviousMonth as L on L.day = M.day

This does not work, when I try to debug it, I ask for the min(day) to know the window I'm searching, like this:

with LastDayPreviousMonth AS (
  SELECT
    day,
    LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
  FROM
    main_table
)
select
day
,  SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
, min(M.day) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) as min_day
FROM
main_table as M
INNER JOIN LastDayPreviousMonth as L on L.day = M.day

I expected the min_day value to come as '2023-12-03' for the row where day = '2024-04-05 00:00:00 UTC', but it's always '2024-01-01 00:00:00 UTC' for every day of April, and '2023-12-01 00:00:00 UTC' for every day of March, and so on, could someone point out what I'm doing wrong?

EDIT: When I say it "doesn't work", I mean that, the window I'm running the sum over is not the window I want, I did the calculations by hand on excel to verify this, and I did the MIN function to check that I'm actually not looking at the correct window

Upvotes: 0

Views: 113

Answers (1)

Samuel
Samuel

Reputation: 3528

I guess, the join enlarges the dataset. Please use group by 1 to have for each day only one entry.

You want to go back 120 days, therefore you used the 10368000 seconds.

with 
main_table as (
  SELECT 1 amount_1, * from unnest(generate_date_array("2023-01-01",current_date()) ) as day,unnest([1,2]) as test
),
LastDayPreviousMonth AS (
  SELECT
    day,
    LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
  FROM
    main_table
    group by 1 --- to have for each day only one row
)
select
M.day,
last_day_previous_month
,  SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
, min(M.day) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) as min_day
FROM
main_table as M
INNER JOIN LastDayPreviousMonth as L on L.day = M.day

The join is not needed:

with 
main_table as (
  SELECT 1 amount_1, * from unnest(generate_date_array("2023-01-01",current_date()) ) as day,unnest([1,2]) as test
),
LastDayPreviousMonth AS (
  SELECT
    *,#day,
    LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
  FROM
    main_table
    #group by 1 --- to have for each day only one row
)
select
M.day,
last_day_previous_month
,  SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
, min(M.day) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) as min_day
FROM
LastDayPreviousMonth as M

Upvotes: 0

Related Questions