Insworn
Insworn

Reputation: 71

Postgresql aggregate values based on date

I have a table with following columns:

**name** | **start_date** | **termination_date** |  
X1 | 2019-01-01 | NULL  
X2 | 2018-05-01 | 2019-03-01 

I want to figure out how many people have been subscribed per month. Meaning that the start_date was before month X and the termination_date was either afterwards or is null.
E.g. For '2019-01-01' I want every row where the start_date was before or at '2019-01-01' and the termination_date is either after '2019-01-01' or Null

I tried it with this query:

SELECT start_date,
       sum(count(distinct name)) OVER (order by start_date)
from Table
where
    is_active = TRUE
    and (termination_date >= start_date or termination_date is null)
Group by start_date
Order by start_date;

However, when I checked some of the results by hand I saw that it was not returning the correct values but the sum() is to high. I have no idea why this happens.

Can anyone point me in the rigth direction?

Upvotes: 1

Views: 176

Answers (1)

S-Man
S-Man

Reputation: 23676

demo:db<>fiddle

SELECT
    gs::date,
    COUNT(*)
FROM
    subscription,
    generate_series(                                             -- 1
        date_trunc('month', start_date),                         -- 2
        date_trunc('month', COALESCE(termination_date, now())),  -- 3
        interval '1 month'
    ) gs
GROUP BY gs::date                                                -- 4
ORDER BY 1
  1. generate_series() generates dates in a certain interval. In this case every month.
  2. The start is the first day of the start month (calculated by date_trunc()).
  3. The end is the first day of the termination month. If there is none, the current day is taken (COALESCE())
  4. This calculation is done for every record separately. So the dates can be grouped afterwards.

Of course you can format your month using to_char(): demo

Upvotes: 2

Related Questions