Reputation: 9018
I have a following table called data_users
id | created_at
-------+---------------------
20127 | 2015-01-31 04:23:46
21468 | 2015-02-04 07:50:34
21571 | 2015-02-04 08:23:50
20730 | 2015-03-12 10:20:16
19955 | 2015-03-30 07:44:35
20148 | 2015-04-17 13:03:26
21552 | 2015-05-07 19:00:00
20145 | 2015-06-02 03:12:46
21467 | 2015-06-03 13:21:51
21074 | 2015-07-03 19:00:00
I'm trying to find a query for generating cumulative sums for dynamic intervals in Postgres.
SELECT DISTINCT DATE_TRUNC('DAY', GREATEST(d.date, from)) AS DATE,
SUM(COALESCE(COUNT(u.id), 0)::INT) OVER (ORDER BY DATE)
FROM (SELECT GENERATE_SERIES(MIN(created_at)::DATE, to::DATE, '1 DAY'::INTERVAL) AS date
FROM data_users) d
LEFT OUTER JOIN data_users u ON u.created_at::DATE = d.date::DATE
GROUP BY DATE ORDER BY DATE;
This returns the correct cumulative sums for days, but I want to support other intervals too.
I want to support intervals like MONTH
, YEAR
, WEEK
. What do I need to change in my query to support that?
Upvotes: 1
Views: 1414
Reputation: 701
Let's see if I understand it this time. I'm assuming you want separate queries for each interval. (They could be joined together.) Also I'm assuming you want cumulative for the whole series. (The accumulation could restart for each group.)
The daily accumulation can be made simpler, still giving the same results:
\set from_date '''2015-01-01'''
\set to_date '''2015-12-31'''
select distinct
to_char(a.series_date, 'YYYY-MM-DD') as series_date,
count(b.id)
from
(
select
generate_series(:from_date::date, :to_date::date, '1 day'::interval) as series_date
) a
left outer join
data_users b
on date_trunc('day', b.created_at) <= a.series_date
group by
series_date
order by
series_date;
For weeks:
select distinct
to_char(b.last_day_of_week, 'Day YYYY-MM-DD') as week_ending,
count(c.id)
from
(
select
series_date as last_day_of_week
from
(
select
generate_series(:from_date::date, :to_date::date, '1 day'::interval) as series_date
) a
where
extract('dow' from series_date) = 6
) b
left outer join
data_users c
on date_trunc('day', c.created_at) <= b.last_day_of_week
group by
week_ending
order by
week_ending;
Results:
week_ending | count
----------------------+-------
Saturday 2015-01-03 | 0
Saturday 2015-01-10 | 0
Saturday 2015-01-17 | 0
Saturday 2015-01-24 | 0
Saturday 2015-01-31 | 1
Saturday 2015-02-07 | 3
Saturday 2015-02-14 | 3
Saturday 2015-02-21 | 3
Saturday 2015-02-28 | 3
Saturday 2015-03-07 | 3
Saturday 2015-03-14 | 4
Saturday 2015-03-21 | 4
Saturday 2015-03-28 | 4
Saturday 2015-04-04 | 5
Saturday 2015-04-11 | 5
Saturday 2015-04-18 | 6
Saturday 2015-04-25 | 6
Saturday 2015-05-02 | 6
Saturday 2015-05-09 | 7
Saturday 2015-05-16 | 7
Saturday 2015-05-23 | 7
Saturday 2015-05-30 | 7
Saturday 2015-06-06 | 9
Saturday 2015-06-13 | 9
Saturday 2015-06-20 | 9
Saturday 2015-06-27 | 9
Saturday 2015-07-04 | 10
Saturday 2015-07-11 | 10
Saturday 2015-07-18 | 10
Saturday 2015-07-25 | 10
Saturday 2015-08-01 | 10
Saturday 2015-08-08 | 10
Saturday 2015-08-15 | 10
Saturday 2015-08-22 | 10
Saturday 2015-08-29 | 10
Saturday 2015-09-05 | 10
Saturday 2015-09-12 | 10
Saturday 2015-09-19 | 10
Saturday 2015-09-26 | 10
Saturday 2015-10-03 | 10
Saturday 2015-10-10 | 10
Saturday 2015-10-17 | 10
Saturday 2015-10-24 | 10
Saturday 2015-10-31 | 10
Saturday 2015-11-07 | 10
Saturday 2015-11-14 | 10
Saturday 2015-11-21 | 10
Saturday 2015-11-28 | 10
Saturday 2015-12-05 | 10
Saturday 2015-12-12 | 10
Saturday 2015-12-19 | 10
Saturday 2015-12-26 | 10
(52 rows)
For months, showing totals at the last day of each month:
select distinct
b.last_day_of_month,
count(c.id)
from
(
select
(extract(year from series_date) || '-' || extract(month from series_date) || '-' || '01')::date +
'1 month'::interval -
'1 day'::interval as last_day_of_month
from
(
select
generate_series(:from_date::date, :to_date::date, '1 month'::interval) as series_date
) a
) b
left outer join
data_users c
on date_trunc('day', c.created_at) <= b.last_day_of_month
group by
last_day_of_month
order by
last_day_of_month;
Results:
last_day_of_month | count
---------------------+-------
2015-01-31 00:00:00 | 1
2015-02-28 00:00:00 | 3
2015-03-31 00:00:00 | 5
2015-04-30 00:00:00 | 6
2015-05-31 00:00:00 | 7
2015-06-30 00:00:00 | 9
2015-07-31 00:00:00 | 10
2015-08-31 00:00:00 | 10
2015-09-30 00:00:00 | 10
2015-10-31 00:00:00 | 10
2015-11-30 00:00:00 | 10
2015-12-31 00:00:00 | 10
(12 rows)
For years I added 3 more rows:
insert into data_users (id, created_at) values (30123, '2016-01-02 03:12:46');
insert into data_users (id, created_at) values (30234, '2016-03-03 13:21:51');
insert into data_users (id, created_at) values (30345, '2016-07-03 19:00:00');
And changed the to_date:
\set to_date '''2017-12-31'''
select distinct
extract('year' from b.last_day_of_year) as year,
count(c.id)
from
(
select
series_date + '1 year'::interval - '1 day'::interval as last_day_of_year
from
(
select
generate_series(:from_date::date, :to_date::date, '1 year'::interval) as series_date
) a
) b
left outer join
data_users c
on date_trunc('day', c.created_at) <= b.last_day_of_year
group by
year
order by
year;
Results:
year | count
------+-------
2015 | 10
2016 | 13
2017 | 13
(3 rows)
How are we doing?
Upvotes: 0
Reputation: 36137
An idea that uses case expessions:
SELECT x.date,
count( case when u.created_at >= x.date and u.created_at < x.date + interval '1' day
then u.id end ) as int_1_day,
count( case when u.created_at >= x.date and u.created_at < x.date + interval '7' day
then u.id end ) as int_1_week,
count( case when u.created_at >= x.date and u.created_at < x.date + interval '1' month
then u.id end ) as int_1_month,
count( case when u.created_at >= x.date and u.created_at < x.date + interval '1' year
then u.id end ) as int_1_year
FROM (
SELECT GENERATE_SERIES(MIN(created_at)::DATE, date '2015-07-31', '1 DAY'::INTERVAL) AS date
FROM data_users
) x
LEFT JOIN data_users u
ON u.created_at BETWEEN x.date and x.date + interval '1' year
GROUP BY x.date
ORDER BY x.date
Live demo: http://sqlfiddle.com/#!17/05e1a/10
Note: a join condition ON u.created_at BETWEEN x.date and x.date + interval '1' year
must use the longest interval for which you need to calculate counts, in the above example: + interval '1' year
Upvotes: 2
Reputation: 701
This sounds too easy; am I missing something? You can specify all the intervals you want in PostgreSQL, like this:
select
'1 day'::interval,
'1 week'::interval,
'1 month'::interval,
'1 year'::interval;
interval | interval | interval | interval
----------+----------+----------+----------
1 day | 7 days | 1 mon | 1 year
(1 row)
Upvotes: -1