Jivan
Jivan

Reputation: 23098

Time-window rolling sum in Postgresql

I'd like to know if it's possible to use window queries based on time in Postgresql.

The original data is in the first three columns (date, salesman, amount):

date salesman amount sum-3-rolling-days
2020-01-01 john 10 10
2020-01-02 john 15 25
2020-01-03 john 8 33
2020-01-04 john 12 35
2020-01-05 john 11 31
2020-01-01 daniel 5 5
2020-01-02 daniel 6 11
2020-01-03 daniel 7 18
2020-01-04 daniel 8 21
2020-01-05 daniel 9 24

The fourth column represents the total amount by this salesman during the past three rolling days.

Pandas has built-in functions to do this, but I can't think of any way to do it in Postgresql using built-in sum() over () syntax. The only way I was able to is using a convoluted mix of lateral joins and subqueries with conditions on time delta comparisons, which is inelegant to say the least.

Pandas' way (by memory, exact syntax might differ slightly) — cannot get any terser:

df.groupby('salesman').rolling('3d').sum()

Upvotes: 2

Views: 3926

Answers (2)

kometen
kometen

Reputation: 7862

You can use CTE (common table expression). Try with

with data as (
select
  date,
  salesman,
  amount
from
  foo
group by
  date,
  salesman
order by
  date,
  salesman,
)

select
  *,
  sum(amount) over (
    partition by salesman order by date,
    salesman rows between unbounded preceding and current row
  ) as rolling
from data;

Upvotes: 0

S-Man
S-Man

Reputation: 23766

demos:db<>fiddle

SELECT
    *,
    SUM(amount) OVER (
        PARTITION BY salesman                     -- 1
        ORDER BY "date"                           -- 2
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW  -- 3
    )
FROM mytable
  1. Build groups/partitions by the salesman. So, the following will only done within theses partitions
  2. Order them by the date column
  3. Sum only those records which are between 2 date fore the current one and the current one. This is the rolling part

If you are using Postgres 11 or later, you can define the window even more precisely using RANGE with date intervals instead of counting ROWS:

SELECT
    *,
    SUM(amount) OVER (
        PARTITION BY salesman                    
        ORDER BY "date"                          
        RANGE BETWEEN interval '2 days' PRECEDING AND CURRENT ROW
    )
FROM mytable

Upvotes: 9

Related Questions