TheBiscuit
TheBiscuit

Reputation: 103

Create column for rolling total for the previous month of a current rows date

Context
Using Presto syntax, I'm trying to create an output table that has rolling totals of an 'amount' column value for each day in a month. In each row there will also be a column with a rolling total for the previous month, and also a column with the difference between the totals.

Output Requirements

  1. completed: create month_to_date_amount column that stores rolling total from sum of amount column. The range for the rolling total is between 1st of month and current row date column value. Restart rolling total each month.
    I already have a working query below that creates this column.
SELECT
    *,
    SUM(amount) OVER (
        PARTITION BY
            team,
            month_id
        ORDER BY
            date ASC
    ) month_to_date_amount
FROM (
    SELECT -- this subquery is required to handle duplicate dates
        date,
        SUM(amount) AS amount,
        team,
        month_id
    FROM input_table
    GROUP BY
        date,
        team,
        month_id
) AS t
  1. create prev_month_to_date_amount column that:

    a. stores previous months rolling amount for the current rows date and team and add to same output row.

    b. Return 0 if there is no record matching the previous month date. (Ex. Prev months date for March 31 is Feb 31 so does not exist). Also a record will not exist for days that have no amount values. Example output table is below.

  2. create movement column that stores the difference amount between month_to_date_amount column and prev_month_to_date_amount column from current row.

Question

Could someone assist with my 2nd and 3rd requirements above to achieve my desired output shown below? By either adding on to my current query above, or creating another more efficient one if necessary. A solution with multiple queries is fine.

Input Table

team date amount month_id
A 2022-04-01 1 2022-04
A 2022-04-01 1 2022-04
A 2022-04-02 1 2022-04
B 2022-04-01 3 2022-04
B 2022-04-02 3 2022-04
B 2022-05-01 4 2022-05
B 2022-05-02 4 2022-05
C 2022-05-01 1 2022-05
C 2022-05-02 1 2022-05
C 2022-06-01 5 2022-06
C 2022-06-02 5 2022-06

enter image description here

Upvotes: 1

Views: 695

Answers (1)

Stewart
Stewart

Reputation: 835

This answer is a good example of using the window function LAG. In summary the query partitions the data by Team and Day of Month, and uses LAG to get the previous months amount and calculate the movement value.

e.g. for Team B data. The window function will create two partition sets: one with the Team B 01/04/2022 and 01/05/2022 rows, and one with the Team B 02/04/2022 and 02/05/2022 rows, order each partition set by date. Then for each set for each row, use LAG to get the data from the previous row (if one exists) to enable calculation of the movement and retrieve the previous months amount.

I hope this helps.

;with
totals
as
( 
  select
   *,
   sum(amount) over(
        partition by team, month_id 
         order by date, team) monthToDateAmount
  from
   ( select 
        date,
        sum(amount) as amount,
        team,
        month_id
    from input_table
    group by
        date,
        team,
        month_id
    ) as x
),
totalsWithMovement
as
(
select
  *,
  monthToDateAmount
    - coalesce(lag(monthToDateAmount) over(
          partition by team,day(date(date))
          order by team, date),0) 
      as movement,
  coalesce(lag(monthToDateAmount) over
        (partition by team, day(date(date)) 
        order by team,month_id),0) 
  as prevMonthToDateAmount
from
  totals
)
select
  date, amount, team, monthToDateAmount,
            prevMonthToDateAmount, movement  
from
  totalswithmovement
order by
  team, date;

Upvotes: 1

Related Questions