khairul
khairul

Reputation: 1260

Count cumulative total in Postgresql

I am using count and group by to get the number of subscribers registered each day:

  SELECT created_at, COUNT(email)  
    FROM subscriptions 
GROUP BY created at;

Result:

created_at  count
-----------------
04-04-2011  100
05-04-2011   50
06-04-2011   50
07-04-2011  300

I want to get the cumulative total of subscribers every day instead. How do I get this?

created_at  count
-----------------
04-04-2011  100
05-04-2011  150
06-04-2011  200
07-04-2011  500

Upvotes: 79

Views: 92038

Answers (6)

Pstr
Pstr

Reputation: 857

To anyone seeing this answer today (2021) You can use rollup

SELECT created_at, COUNT(email)  
    FROM subscriptions 
GROUP BY rollup(created_at);

this will give you a new row with the total

created_at  count
-----------------
04-04-2011  100
05-04-2011   50
06-04-2011   50
07-04-2011  300
NULL        500

You can also use rollup for partial results if you have more than one parameter to show in your group by. If you have a created_by for instance:

SELECT created_at, created_by COUNT(email)  
    FROM subscriptions 
GROUP BY rollup(created_at, created_by);

this will give you a new row with the total

created_at  created_by  count
-----------------------------
04-04-2011     1        80
04-04-2011     2        20
04-04-2021    NULL      100
05-04-2011     1        20
05-04-2011     2        30
05-04-2011    NULL      50
NULL          NULL      150

I only took the numbers of the first two days, but that's the idea. it will show grouped by date, then total of that day, then the total of totals.

Order matters in the rollup() here, as to how the partial totals will be displayed

Upvotes: 6

intgr
intgr

Reputation: 20466

With larger datasets, window functions are the most efficient way to perform these kinds of queries -- the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. :) PostgreSQL 8.4 and up have support for window functions.

This is what it looks like:

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM subscriptions
GROUP BY created_at;

Here OVER creates the window; ORDER BY created_at means that it has to sum up the counts in created_at order.


Edit: If you want to remove duplicate emails within a single day, you can use sum(count(distinct email)). Unfortunately this won't remove duplicates that cross different dates.

If you want to remove all duplicates, I think the easiest is to use a subquery and DISTINCT ON. This will attribute emails to their earliest date (because I'm sorting by created_at in ascending order, it'll choose the earliest one):

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM (
    SELECT DISTINCT ON (email) created_at, email
    FROM subscriptions ORDER BY email, created_at
) AS subq
GROUP BY created_at;

If you create an index on (email, created_at), this query shouldn't be too slow either.


(If you want to test, this is how I created the sample dataset)

create table subscriptions as
   select date '2000-04-04' + (i/10000)::int as created_at,
          '[email protected]' || (i%700000)::text as email
   from generate_series(1,1000000) i;
create index on subscriptions (email, created_at);

Upvotes: 132

mentat
mentat

Reputation: 1

The best way is to have a calendar table: calendar ( date date, month int, quarter int, half int, week int, year int )

Then, you can join this table to make summary for the field you need.

Upvotes: -3

Endy Tjahjono
Endy Tjahjono

Reputation: 24450

I assume you want only one row per day and you want to still show days without any subscriptions (suppose nobody subscribes for a certain date, do you want to show that date with the balance of the previous day?). If this is the case, you can use the 'with' feature:

with recursive serialdates(adate) as (
    select cast('2011-04-04' as date)
    union all
    select adate + 1 from serialdates where adate < cast('2011-04-07' as date)
)
select D.adate,
(
    select count(distinct email)
    from subscriptions
    where created_at between date_trunc('month', D.adate) and D.adate
)
from serialdates D

Upvotes: 2

Andriy M
Andriy M

Reputation: 77657

SELECT
  s1.created_at,
  COUNT(s2.email) AS cumul_count
FROM subscriptions s1
  INNER JOIN subscriptions s2 ON s1.created_at >= s2.created_at
GROUP BY s1.created_at

Upvotes: 2

OMG Ponies
OMG Ponies

Reputation: 332541

Use:

SELECT a.created_at,
       (SELECT COUNT(b.email)
          FROM SUBSCRIPTIONS b
         WHERE b.created_at <= a.created_at) AS count
  FROM SUBSCRIPTIONS a

Upvotes: 9

Related Questions