Reputation: 103
Context
Using Presto syntax, I'm trying to create an output table that has rolling totals of an 'amount' column value for each day in a month. In each row there will also be a column with a rolling total for the previous month, and also a column with the difference between the totals.
Output Requirements
month_to_date_amount
column that stores rolling total from
sum of amount
column. The range for the rolling total is between 1st of month and current row date
column value. Restart rolling
total each month. SELECT
*,
SUM(amount) OVER (
PARTITION BY
team,
month_id
ORDER BY
date ASC
) month_to_date_amount
FROM (
SELECT -- this subquery is required to handle duplicate dates
date,
SUM(amount) AS amount,
team,
month_id
FROM input_table
GROUP BY
date,
team,
month_id
) AS t
create prev_month_to_date_amount
column that:
a. stores previous months rolling amount for the current rows date
and team
and add to same
output row.
b. Return 0
if there is no record matching the previous month date. (Ex. Prev months date for March 31 is Feb 31 so does not exist). Also a record will not exist for days that have no amount
values. Example output table is below.
create movement
column that stores the difference
amount between month_to_date_amount
column and
prev_month_to_date_amount
column from current row.
Could someone assist with my 2nd and 3rd requirements above to achieve my desired output shown below? By either adding on to my current query above, or creating another more efficient one if necessary. A solution with multiple queries is fine.
team | date | amount | month_id |
---|---|---|---|
A | 2022-04-01 | 1 | 2022-04 |
A | 2022-04-01 | 1 | 2022-04 |
A | 2022-04-02 | 1 | 2022-04 |
B | 2022-04-01 | 3 | 2022-04 |
B | 2022-04-02 | 3 | 2022-04 |
B | 2022-05-01 | 4 | 2022-05 |
B | 2022-05-02 | 4 | 2022-05 |
C | 2022-05-01 | 1 | 2022-05 |
C | 2022-05-02 | 1 | 2022-05 |
C | 2022-06-01 | 5 | 2022-06 |
C | 2022-06-02 | 5 | 2022-06 |
Upvotes: 1
Views: 695
Reputation: 835
This answer is a good example of using the window function LAG. In summary the query partitions the data by Team and Day of Month, and uses LAG to get the previous months amount and calculate the movement value.
e.g. for Team B data. The window function will create two partition sets: one with the Team B 01/04/2022 and 01/05/2022 rows, and one with the Team B 02/04/2022 and 02/05/2022 rows, order each partition set by date. Then for each set for each row, use LAG to get the data from the previous row (if one exists) to enable calculation of the movement and retrieve the previous months amount.
I hope this helps.
;with
totals
as
(
select
*,
sum(amount) over(
partition by team, month_id
order by date, team) monthToDateAmount
from
( select
date,
sum(amount) as amount,
team,
month_id
from input_table
group by
date,
team,
month_id
) as x
),
totalsWithMovement
as
(
select
*,
monthToDateAmount
- coalesce(lag(monthToDateAmount) over(
partition by team,day(date(date))
order by team, date),0)
as movement,
coalesce(lag(monthToDateAmount) over
(partition by team, day(date(date))
order by team,month_id),0)
as prevMonthToDateAmount
from
totals
)
select
date, amount, team, monthToDateAmount,
prevMonthToDateAmount, movement
from
totalswithmovement
order by
team, date;
Upvotes: 1