user3147973
user3147973

Reputation: 435

How to count rows using a variable date range provided by a table in PostgreSQL

I suspect I require some sort of windowing function to do this. I have the following item data as an example:

count | date
------+-----------
3     | 2017-09-15
9     | 2017-09-18
2     | 2017-09-19
6     | 2017-09-20
3     | 2017-09-21

So there are gaps in my data first off, and I have another query here:

select until_date, until_date - (lag(until_date)  over ()) as delta_days from ranges

Which I have generated the following data:

until_date | delta_days
-----------+-----------
2017-09-08 |
2017-09-11 | 3
2017-09-13 | 2
2017-09-18 | 5
2017-09-21 | 3
2017-09-22 | 1

So I'd like my final query to produce this result:

start_date | ending_date | total_items
-----------+-------------+--------------
2017-09-08 | 2017-09-10  | 0
2017-09-11 | 2017-09-12  | 0
2017-09-13 | 2017-09-17  | 3
2017-09-18 | 2017-09-20  | 15
2017-09-21 | 2017-09-22  | 3

Which tells me the total count of items from the first table, per day, based on the custom ranges from the second table.

In this particular example, I would be summing up total_items BETWEEN start AND end (since there would be overlap on the dates, I'd subtract 1 from the end date to not count duplicates)

Anyone know how to do this?

Thanks!

Upvotes: 1

Views: 512

Answers (1)

klin
klin

Reputation: 121634

Use the daterange type. Note that you do not have to calculate delta_days, just convert ranges to dataranges and use the operator <@ - element is contained by.

with counts(count, date) as (
values
    (3, '2017-09-15'::date),
    (9, '2017-09-18'),
    (2, '2017-09-19'),
    (6, '2017-09-20'),
    (3, '2017-09-21')
),
ranges (until_date) as (
values
    ('2017-09-08'::date),
    ('2017-09-11'),
    ('2017-09-13'),
    ('2017-09-18'),
    ('2017-09-21'),
    ('2017-09-22')
)
select daterange, coalesce(sum(count), 0) as total_items
from (
    select daterange(lag(until_date) over (order by until_date), until_date)
    from ranges
    ) s
left join counts on date <@ daterange
where not lower_inf(daterange)
group by 1
order by 1;

        daterange        | total_items 
-------------------------+-------------
 [2017-09-08,2017-09-11) |           0
 [2017-09-11,2017-09-13) |           0
 [2017-09-13,2017-09-18) |           3
 [2017-09-18,2017-09-21) |          17
 [2017-09-21,2017-09-22) |           3
(5 rows)    

Note, that in the dateranges above lower bounds are inclusive while upper bound are exclusive.

If you want to calculate items per day in the dateranges:

select 
    daterange, total_items, 
    round(total_items::dec/(upper(daterange)- lower(daterange)), 2) as items_per_day
from (
    select daterange, coalesce(sum(count), 0) as total_items
    from (
        select daterange(lag(until_date) over (order by until_date), until_date)
        from ranges
        ) s
    left join counts on date <@ daterange
    where not lower_inf(daterange)
    group by 1
    ) s
order by 1

        daterange        | total_items | items_per_day 
-------------------------+-------------+---------------
 [2017-09-08,2017-09-11) |           0 |          0.00
 [2017-09-11,2017-09-13) |           0 |          0.00
 [2017-09-13,2017-09-18) |           3 |          0.60
 [2017-09-18,2017-09-21) |          17 |          5.67
 [2017-09-21,2017-09-22) |           3 |          3.00
(5 rows)

Upvotes: 1

Related Questions