OAK
OAK

Reputation: 3166

SQL - How Cumulative Sum when group by values are missing

As a follow-up to my previous ticket - which I now believe the example was too simple (previous question), I prepared an example of a scenario where I'm looking to aggregate column cus_sum group on the date_col column and the cus column representing the unique customer number. I wish to generate a series of dates for instance (function generate series) from the 1st of January 2018 till the 10th of January 2018 and then have a cumulative sum of column cus_sum for each customer. As in the case below, you can imagine that there are days without information at all and days where not all customers have any records - regardless i want to show their cumulative sum throughout that period.

CREATE TABLE test2 (date_col date, cus int, cus_sum int);
    insert into test2 values ('2018-01-01', 1, 5);
    insert into test2 values ('2018-01-02', 1, 12);
    insert into test2 values ('2018-01-02', 2, 14);
    insert into test2 values ('2018-01-03', 2, 8);
    insert into test2 values ('2018-01-03', 2, 10);
    insert into test2 values ('2018-01-04', 1, 22);
    insert into test2 values ('2018-01-06', 2, 20);
    insert into test2 values ('2018-01-06', 1, 5);
    insert into test2 values ('2018-01-07', 1, 45);
    insert into test2 values ('2018-01-08', 2, 32);

The output should look like:

date_col       cus  cum_sum
"2018-01-01"    1   5
"2018-01-01"    2   0
"2018-01-02"    1   17
"2018-01-02"    2   14
"2018-01-03"    1   17
"2018-01-03"    2   32
"2018-01-04"    1   39
"2018-01-04"    2   32
"2018-01-05"    1   39
"2018-01-05"    2   32
"2018-01-06"    1   89
"2018-01-06"    2   52
"2018-01-07"    1   134
"2018-01-07"    2   52
"2018-01-08"    1   134
"2018-01-08"    1   84

Perhaps I should add that - one table I assume will be a virtual table that generates a list of dates in a given timeframe. The second table is a list of customers[1,3,4,5..10], products purchases (product volume) - which is what I wish to cumulative sum for every customer and everyday of the series.

Upvotes: 0

Views: 912

Answers (2)

klin
klin

Reputation: 121604

The cross join of generate_series() and unnest() creates a virtual table of all possible values:

select distinct
    date_col::date, 
    cus, 
    coalesce(sum(cus_sum) over (partition by cus order by date_col), 0) as cum_sum
from generate_series('2018-01-01'::date, '2018-01-08', '1d') as date_col
cross join (select distinct cus from test2) c
left join test2 using (date_col, cus)   
order by date_col, cus

  date_col  | cus | cum_sum 
------------+-----+---------
 2018-01-01 |   1 |       5
 2018-01-01 |   2 |       0
 2018-01-02 |   1 |      17
 2018-01-02 |   2 |      14
 2018-01-03 |   1 |      17
 2018-01-03 |   2 |      32
 2018-01-04 |   1 |      39
 2018-01-04 |   2 |      32
 2018-01-05 |   1 |      39
 2018-01-05 |   2 |      32
 2018-01-06 |   1 |      44
 2018-01-06 |   2 |      52
 2018-01-07 |   1 |      89
 2018-01-07 |   2 |      52
 2018-01-08 |   1 |      89
 2018-01-08 |   2 |      84
(16 rows)

It looks like there are mistakes in the OP's expected results.

DbFiddle.

Upvotes: 1

Vinit
Vinit

Reputation: 2607

Assuming that you have separate table for customers, so you can use CTE to generate the data range and then join croos join customer table to have all combinations of customer and dates, then you get the sum from test2 table. the query will look like below -

WITH DateRange AS (
        SELECT
         [MyDate] = CONVERT(DATETIME,'01/01/2018')
        UNION ALL 
        SELECT
         [MyDate] = DATEADD(DAY, 1, [Date])
        FROM
         DateRange
        WHERE
         [MyDate] <= '01/10/2018'
) SELECT
 d.[MyDate]
 c.cus
 (
  select isnull(sum(cus_sume),0) 
  from test2 t 
  where t.date = d.mydate 
  and c.cust = t.cust
 ) as cus_sum
FROM
 DateRange d 
 cross join customer c
order by d.MyDate

Upvotes: 1

Related Questions