user1147339
user1147339

Reputation: 41

Efficient sliding window sum over a database table

A database has a transactions table with columns: account_id, date, transaction_value (signed integer). Another table (account_value) stores the current total value of each account, which is the sum of all transaction_values per account. It is updated with a trigger on the transactions table (i.e., INSERTs, UPDATEs and DELETEs to transactions fire the trigger to change the account_value.)

A new requirement is to calculate the account's total transaction value only over the last 365 days. Only the current running total is required, not previous totals. This value will be requested often, almost as often as the account_value.

How would you implement this "sliding window sum" efficiently? A new table is ok. Is there a way to avoid summing over a year's range every time?

Upvotes: 4

Views: 4719

Answers (5)

user330315
user330315

Reputation:

This can be done with standard windowing functions:

SELECT account_id,
       sum(transaction_value) over (partition by account_id order by date)
FROM transactions

The order by inside the over() claues makes the sum a "sliding sum".

For the "only the last 356 days" you'd need a second query that will limit the rows in the WHERE clause.

The above works in PostgreSQL, Oracle, DB2 and (I think) Teradata. SQL Server does not support the order by in the window definition (the upcoming Denali version will AFAIK)

Upvotes: 2

MatBailie
MatBailie

Reputation: 86765

I'll avoid any actual SQL here as it varies a lot depending on the variety of SQL that you are using.


You say that you have a trigger to maintain the existing running total.

I presume that it also (or perhaps a nightly process) creates new daily records in the account_value table. Then INSERTs, UPDATEs and DELETEs fire the trigger to add or subtract from the existing running total?

The only changes you need to make are:
- add a new field, "yearly_value" or something
- have the existing trigger update that in the same way as the existing field
- use gbn's type of answer to create today's records (or however far you backdate)
- but initialise each new daily record in a slightly different way...

When you insert a new row for a new day, it should be initialised to yesterday's value - the value 365 days ago. After that, the behavior should be identical to what you're already used to.

Upvotes: 0

user359040
user359040

Reputation:

You are going to need a one-off script to populate the existing table with values for the preceding year for each existing record - that will need to run for the whole of the previous year for each record generated.

Once the rolling year column is populated, one alternative to summing the previous year would be to derive each new record's value as the previous record's rolling year value, plus the transaction value(s) since the last update, minus the transaction values between one year prior to the last update and one year ago from now.

I suggest trying both approaches against realistic test data to see which will perform better - I would expect summing the whole year to perform at least as well where data is relatively sparse, while the difference method may work better if data is to be frequently updated on each account.

Upvotes: 0

jon
jon

Reputation: 6246

If queries of the transactions table are more frequent than inserts to the transactions table, then perhaps a view is the way to go?

Upvotes: 1

gbn
gbn

Reputation: 432431

As simple as this?

SELECT
   SUM(transaction_value), account_id
FROM
   transactions t
WHERE
   -- SQL Server, Sybase       t.DATE >= DATEADD(year, -1, GETDATE())
   -- MySQL            t.DATE >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
GROUP BY
   account_id;

You may want to remove the time component from the date expressions using DATE (MySQL) or this way in SQL Server

Upvotes: 1

Related Questions