shime
shime

Reputation: 9018

Cumulative sum with dynamic interval in PostgreSQL

I have a following table called data_users

 id   |     created_at
-------+---------------------
 20127 | 2015-01-31 04:23:46
 21468 | 2015-02-04 07:50:34
 21571 | 2015-02-04 08:23:50
 20730 | 2015-03-12 10:20:16
 19955 | 2015-03-30 07:44:35
 20148 | 2015-04-17 13:03:26
 21552 | 2015-05-07 19:00:00
 20145 | 2015-06-02 03:12:46
 21467 | 2015-06-03 13:21:51
 21074 | 2015-07-03 19:00:00

I'm trying to find a query for generating cumulative sums for dynamic intervals in Postgres.

SELECT DISTINCT DATE_TRUNC('DAY', GREATEST(d.date, from)) AS DATE,
           SUM(COALESCE(COUNT(u.id), 0)::INT) OVER (ORDER BY DATE)
FROM (SELECT GENERATE_SERIES(MIN(created_at)::DATE, to::DATE, '1 DAY'::INTERVAL) AS date
        FROM data_users) d
LEFT OUTER JOIN data_users u ON u.created_at::DATE = d.date::DATE
GROUP BY DATE ORDER BY DATE;

This returns the correct cumulative sums for days, but I want to support other intervals too.

I want to support intervals like MONTH, YEAR, WEEK. What do I need to change in my query to support that?

Upvotes: 1

Views: 1414

Answers (3)

Ron Ballard
Ron Ballard

Reputation: 701

Let's see if I understand it this time. I'm assuming you want separate queries for each interval. (They could be joined together.) Also I'm assuming you want cumulative for the whole series. (The accumulation could restart for each group.)

The daily accumulation can be made simpler, still giving the same results:

\set from_date '''2015-01-01'''
\set to_date '''2015-12-31'''

select distinct 
    to_char(a.series_date, 'YYYY-MM-DD') as series_date,
    count(b.id)
from
    (
        select 
            generate_series(:from_date::date, :to_date::date, '1 day'::interval) as series_date
    ) a
    left outer join 
    data_users b 
    on date_trunc('day', b.created_at) <= a.series_date
group by
    series_date 
order by
    series_date;

For weeks:

select distinct 
    to_char(b.last_day_of_week, 'Day YYYY-MM-DD') as week_ending,
    count(c.id)
from
    (
        select
            series_date as last_day_of_week
        from
            (
                select 
                    generate_series(:from_date::date, :to_date::date, '1 day'::interval) as series_date
            ) a
        where
            extract('dow' from series_date) = 6
    ) b
    left outer join 
    data_users c 
    on date_trunc('day', c.created_at) <= b.last_day_of_week
group by
    week_ending 
order by
    week_ending;

Results:

     week_ending      | count 
----------------------+-------
 Saturday  2015-01-03 |     0
 Saturday  2015-01-10 |     0
 Saturday  2015-01-17 |     0
 Saturday  2015-01-24 |     0
 Saturday  2015-01-31 |     1
 Saturday  2015-02-07 |     3
 Saturday  2015-02-14 |     3
 Saturday  2015-02-21 |     3
 Saturday  2015-02-28 |     3
 Saturday  2015-03-07 |     3
 Saturday  2015-03-14 |     4
 Saturday  2015-03-21 |     4
 Saturday  2015-03-28 |     4
 Saturday  2015-04-04 |     5
 Saturday  2015-04-11 |     5
 Saturday  2015-04-18 |     6
 Saturday  2015-04-25 |     6
 Saturday  2015-05-02 |     6
 Saturday  2015-05-09 |     7
 Saturday  2015-05-16 |     7
 Saturday  2015-05-23 |     7
 Saturday  2015-05-30 |     7
 Saturday  2015-06-06 |     9
 Saturday  2015-06-13 |     9
 Saturday  2015-06-20 |     9
 Saturday  2015-06-27 |     9
 Saturday  2015-07-04 |    10
 Saturday  2015-07-11 |    10
 Saturday  2015-07-18 |    10
 Saturday  2015-07-25 |    10
 Saturday  2015-08-01 |    10
 Saturday  2015-08-08 |    10
 Saturday  2015-08-15 |    10
 Saturday  2015-08-22 |    10
 Saturday  2015-08-29 |    10
 Saturday  2015-09-05 |    10
 Saturday  2015-09-12 |    10
 Saturday  2015-09-19 |    10
 Saturday  2015-09-26 |    10
 Saturday  2015-10-03 |    10
 Saturday  2015-10-10 |    10
 Saturday  2015-10-17 |    10
 Saturday  2015-10-24 |    10
 Saturday  2015-10-31 |    10
 Saturday  2015-11-07 |    10
 Saturday  2015-11-14 |    10
 Saturday  2015-11-21 |    10
 Saturday  2015-11-28 |    10
 Saturday  2015-12-05 |    10
 Saturday  2015-12-12 |    10
 Saturday  2015-12-19 |    10
 Saturday  2015-12-26 |    10
(52 rows)

For months, showing totals at the last day of each month:

select distinct 
    b.last_day_of_month,
    count(c.id)
from
    (
        select
            (extract(year from series_date) || '-' || extract(month from series_date) || '-' || '01')::date + 
            '1 month'::interval - 
            '1 day'::interval as last_day_of_month
        from
            (
                select 
                    generate_series(:from_date::date, :to_date::date, '1 month'::interval) as series_date
            ) a
    ) b
    left outer join 
    data_users c 
    on date_trunc('day', c.created_at) <= b.last_day_of_month
group by
    last_day_of_month 
order by
    last_day_of_month;

Results:

  last_day_of_month  | count 
---------------------+-------
 2015-01-31 00:00:00 |     1
 2015-02-28 00:00:00 |     3
 2015-03-31 00:00:00 |     5
 2015-04-30 00:00:00 |     6
 2015-05-31 00:00:00 |     7
 2015-06-30 00:00:00 |     9
 2015-07-31 00:00:00 |    10
 2015-08-31 00:00:00 |    10
 2015-09-30 00:00:00 |    10
 2015-10-31 00:00:00 |    10
 2015-11-30 00:00:00 |    10
 2015-12-31 00:00:00 |    10
(12 rows)

For years I added 3 more rows:

insert into data_users (id, created_at) values (30123, '2016-01-02 03:12:46');
insert into data_users (id, created_at) values (30234, '2016-03-03 13:21:51');
insert into data_users (id, created_at) values (30345, '2016-07-03 19:00:00');

And changed the to_date:

\set to_date '''2017-12-31'''

select distinct 
    extract('year' from b.last_day_of_year) as year,
    count(c.id)
from
    (
        select
            series_date + '1 year'::interval - '1 day'::interval as last_day_of_year
        from
            (
                select 
                    generate_series(:from_date::date, :to_date::date, '1 year'::interval) as series_date
            ) a
    ) b
    left outer join 
    data_users c 
    on date_trunc('day', c.created_at) <= b.last_day_of_year
group by
    year
order by
    year;

Results:

 year | count 
------+-------
 2015 |    10
 2016 |    13
 2017 |    13
(3 rows)

How are we doing?

Upvotes: 0

krokodilko
krokodilko

Reputation: 36137

An idea that uses case expessions:

SELECT x.date,
       count( case when u.created_at >= x.date and u.created_at < x.date + interval '1' day 
              then u.id end ) as int_1_day,
       count( case when u.created_at >= x.date and u.created_at < x.date + interval '7' day 
              then u.id end ) as int_1_week,
       count( case when u.created_at >= x.date and u.created_at < x.date + interval '1' month 
              then u.id end ) as int_1_month,
       count( case when u.created_at >= x.date and u.created_at < x.date + interval '1' year 
              then u.id end ) as int_1_year
FROM (
  SELECT GENERATE_SERIES(MIN(created_at)::DATE, date '2015-07-31', '1 DAY'::INTERVAL) AS date
  FROM data_users
) x
LEFT JOIN data_users u
ON u.created_at BETWEEN x.date and x.date + interval '1' year
GROUP BY x.date
ORDER BY x.date

Live demo: http://sqlfiddle.com/#!17/05e1a/10


Note: a join condition ON u.created_at BETWEEN x.date and x.date + interval '1' year must use the longest interval for which you need to calculate counts, in the above example: + interval '1' year

Upvotes: 2

Ron Ballard
Ron Ballard

Reputation: 701

This sounds too easy; am I missing something? You can specify all the intervals you want in PostgreSQL, like this:

select
     '1 day'::interval,
     '1 week'::interval,
     '1 month'::interval,
     '1 year'::interval;
 interval | interval | interval | interval 
----------+----------+----------+----------
 1 day    | 7 days   | 1 mon    | 1 year
(1 row)

Upvotes: -1

Related Questions