Alg_D
Alg_D

Reputation: 2390

Cumulative sum over a table

What is the best way to perform a cumulative sum over a table in Postgres, in a way that can bring the best performance and flexibility in case more fields / columns are added to the table.

Table

    a   b   d
1   59  15  181
2       16  268
3           219
4           102

Cumulative

    a   b   d
1   59  15  181
2       31  449
3           668
4           770

Upvotes: 2

Views: 1413

Answers (3)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656231

I think what you are really looking for is this:

SELECT id
     , sum(a) OVER (PARTITION BY a_grp ORDER BY id) as a
     , sum(b) OVER (PARTITION BY b_grp ORDER BY id) as b
     , sum(d) OVER (PARTITION BY d_grp ORDER BY id) as d 
FROM  (
   SELECT *
        , count(a IS NULL OR NULL) OVER (ORDER BY id) as a_grp
        , count(b IS NULL OR NULL) OVER (ORDER BY id) as b_grp
        , count(d IS NULL OR NULL) OVER (ORDER BY id) as d_grp
   FROM   tbl
   ) sub
ORDER  BY id;

The expression count(col IS NULL OR NULL) OVER (ORDER BY id) forms groups of consecutive non-null rows for a, b and d in the subquery sub.

In the outer query we run cumulative sums per group. NULL values form their own group and stay NULL automatically. No additional CASE statement necessary.

SQL Fiddle (with some added values for column a to demonstrate the effect).

Upvotes: 3

vol7ron
vol7ron

Reputation: 42095

Window functions for running sum.

SELECT sum(a) OVER (ORDER BY d) as "a",
       sum(b) OVER (ORDER BY d) as "b",
       sum(d) OVER (ORDER BY d) as "d" 
FROM table;

If you have more than one running sum, make sure the orders are the same.


It's important to note that if you want your columns to appear as the aggregate table in your question (each field uniquely ordered), it'd be a little more involved.


Update: I've modified the query to do the required sorting, without a given common field.

SQL Fiddle: (1) Only Aggregates, or (2) Source Data Beside Running Sum

WITH 
rcd AS ( 
  select row_number() OVER() as num,a,b,d 
  from tbl
),
sorted_a AS (
  select row_number() OVER(w1) as num, sum(a) over(w2) a
  from tbl
  window w1 as (order by a nulls last),
         w2 as (order by a nulls first)
),
sorted_b AS (
  select row_number() OVER(w1) as num, sum(b) over(w2) b
  from tbl
  window w1 as (order by b nulls last),
         w2 as (order by b nulls first)
),
sorted_d AS (
  select row_number() OVER(w1) as num, sum(d) over(w2) d
  from tbl
  window w1 as (order by d nulls last),
         w2 as (order by d nulls first)
)

SELECT sorted_a.a, sorted_b.b, sorted_d.d 
FROM rcd 
JOIN sorted_a USING(num)
JOIN sorted_b USING(num)
JOIN sorted_d USING(num)
ORDER BY num;

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1269445

You can use window functions, but you need additional logic to avoid values where there are NULLs:

SELECT id,
       (case when a is not null then sum(a) OVER (ORDER BY id) end) as a,
       (case when b is not null then sum(b) OVER (ORDER BY id) end) as b,
       (case when d is not null then sum(d) OVER (ORDER BY id) end) as d 
FROM table;

This assumes that the first column that specifies the ordering is called id.

Upvotes: 3

Related Questions