Reputation: 1260
I am using count
and group by
to get the number of subscribers registered each day:
SELECT created_at, COUNT(email)
FROM subscriptions
GROUP BY created at;
Result:
created_at count
-----------------
04-04-2011 100
05-04-2011 50
06-04-2011 50
07-04-2011 300
I want to get the cumulative total of subscribers every day instead. How do I get this?
created_at count
-----------------
04-04-2011 100
05-04-2011 150
06-04-2011 200
07-04-2011 500
Upvotes: 79
Views: 92038
Reputation: 857
To anyone seeing this answer today (2021) You can use rollup
SELECT created_at, COUNT(email)
FROM subscriptions
GROUP BY rollup(created_at);
this will give you a new row with the total
created_at count
-----------------
04-04-2011 100
05-04-2011 50
06-04-2011 50
07-04-2011 300
NULL 500
You can also use rollup for partial results if you have more than one parameter to show in your group by. If you have a created_by
for instance:
SELECT created_at, created_by COUNT(email)
FROM subscriptions
GROUP BY rollup(created_at, created_by);
this will give you a new row with the total
created_at created_by count
-----------------------------
04-04-2011 1 80
04-04-2011 2 20
04-04-2021 NULL 100
05-04-2011 1 20
05-04-2011 2 30
05-04-2011 NULL 50
NULL NULL 150
I only took the numbers of the first two days, but that's the idea. it will show grouped by date, then total of that day, then the total of totals.
Order matters in the rollup()
here, as to how the partial totals will be displayed
Upvotes: 6
Reputation: 20466
With larger datasets, window functions are the most efficient way to perform these kinds of queries -- the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. :) PostgreSQL 8.4 and up have support for window functions.
This is what it looks like:
SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM subscriptions
GROUP BY created_at;
Here OVER
creates the window; ORDER BY created_at
means that it has to sum up the counts in created_at
order.
Edit: If you want to remove duplicate emails within a single day, you can use sum(count(distinct email))
. Unfortunately this won't remove duplicates that cross different dates.
If you want to remove all duplicates, I think the easiest is to use a subquery and DISTINCT ON
. This will attribute emails to their earliest date (because I'm sorting by created_at in ascending order, it'll choose the earliest one):
SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM (
SELECT DISTINCT ON (email) created_at, email
FROM subscriptions ORDER BY email, created_at
) AS subq
GROUP BY created_at;
If you create an index on (email, created_at)
, this query shouldn't be too slow either.
(If you want to test, this is how I created the sample dataset)
create table subscriptions as
select date '2000-04-04' + (i/10000)::int as created_at,
'[email protected]' || (i%700000)::text as email
from generate_series(1,1000000) i;
create index on subscriptions (email, created_at);
Upvotes: 132
Reputation: 1
The best way is to have a calendar table: calendar ( date date, month int, quarter int, half int, week int, year int )
Then, you can join this table to make summary for the field you need.
Upvotes: -3
Reputation: 24450
I assume you want only one row per day and you want to still show days without any subscriptions (suppose nobody subscribes for a certain date, do you want to show that date with the balance of the previous day?). If this is the case, you can use the 'with' feature:
with recursive serialdates(adate) as (
select cast('2011-04-04' as date)
union all
select adate + 1 from serialdates where adate < cast('2011-04-07' as date)
)
select D.adate,
(
select count(distinct email)
from subscriptions
where created_at between date_trunc('month', D.adate) and D.adate
)
from serialdates D
Upvotes: 2
Reputation: 77657
SELECT
s1.created_at,
COUNT(s2.email) AS cumul_count
FROM subscriptions s1
INNER JOIN subscriptions s2 ON s1.created_at >= s2.created_at
GROUP BY s1.created_at
Upvotes: 2
Reputation: 332541
Use:
SELECT a.created_at,
(SELECT COUNT(b.email)
FROM SUBSCRIPTIONS b
WHERE b.created_at <= a.created_at) AS count
FROM SUBSCRIPTIONS a
Upvotes: 9