Reputation: 9
I have a dataset in BigQuery with a few float columns, let's call them amount_1, amount_2, etc. I also have a day
column, it's a timestamp in the day format, for each day, there is only 1 row, meaning the data is already grouped.
For each day of the month, I want to calculate the SUM(amount_1), considering the last day of the previous month, and going back 120 days, meaning that, in the example of the day = '2024-04-05 00:00:00 UTC', I want to calculate the sum(amount_1) between the dates '2023-12-03' and '2024-03-31', this means that, for every day within a given month, the sum should be the same.
I tried this, but it didn't work:
with LastDayPreviousMonth AS (
SELECT
day,
LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
FROM
main_table
)
select
day
, SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
FROM
main_table as M
INNER JOIN LastDayPreviousMonth as L on L.day = M.day
This does not work, when I try to debug it, I ask for the min(day) to know the window I'm searching, like this:
with LastDayPreviousMonth AS (
SELECT
day,
LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
FROM
main_table
)
select
day
, SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
, min(M.day) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) as min_day
FROM
main_table as M
INNER JOIN LastDayPreviousMonth as L on L.day = M.day
I expected the min_day value to come as '2023-12-03' for the row where day = '2024-04-05 00:00:00 UTC', but it's always '2024-01-01 00:00:00 UTC' for every day of April, and '2023-12-01 00:00:00 UTC' for every day of March, and so on, could someone point out what I'm doing wrong?
EDIT: When I say it "doesn't work", I mean that, the window I'm running the sum over is not the window I want, I did the calculations by hand on excel to verify this, and I did the MIN function to check that I'm actually not looking at the correct window
Upvotes: 0
Views: 113
Reputation: 3528
I guess, the join enlarges the dataset. Please use group by 1
to have for each day only one entry.
You want to go back 120 days, therefore you used the 10368000 seconds.
with
main_table as (
SELECT 1 amount_1, * from unnest(generate_date_array("2023-01-01",current_date()) ) as day,unnest([1,2]) as test
),
LastDayPreviousMonth AS (
SELECT
day,
LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
FROM
main_table
group by 1 --- to have for each day only one row
)
select
M.day,
last_day_previous_month
, SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
, min(M.day) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) as min_day
FROM
main_table as M
INNER JOIN LastDayPreviousMonth as L on L.day = M.day
The join is not needed:
with
main_table as (
SELECT 1 amount_1, * from unnest(generate_date_array("2023-01-01",current_date()) ) as day,unnest([1,2]) as test
),
LastDayPreviousMonth AS (
SELECT
*,#day,
LAST_DAY(DATE_SUB(CAST(timestamp_trunc(day,month) AS DATE), INTERVAL 1 MONTH)) AS last_day_previous_month
FROM
main_table
#group by 1 --- to have for each day only one row
)
select
M.day,
last_day_previous_month
, SUM(amount_1) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) AS amount_sum
, min(M.day) OVER(ORDER BY UNIX_SECONDS(cast(last_day_previous_month as timestamp)) RANGE BETWEEN 10368000 PRECEDING AND 0 PRECEDING) as min_day
FROM
LastDayPreviousMonth as M
Upvotes: 0