Reputation: 966
I have a table named events
like this:
id: int
source_id: int
start_datetime: timestamp
end_datetime: timestamp
These events could have overlaps, and I want to know maximum number of overlapping events that have occurred over a time span. For example, in a situation like this:
id | source_id | start_datetime | end_datetime
----------------------------------------------------------
1 | 23 | 2017-1-1T10:20:00 | 2017-1-1T10:40:00
1 | 42 | 2017-1-1T10:30:00 | 2017-1-1T10:35:00
1 | 11 | 2017-1-1T10:37:00 | 2017-1-1T10:50:00
The answer is 2, because at most 2 events overlap at 10:30 until 10:35.
I'm using Postgres 9.6
Upvotes: 4
Views: 1533
Reputation: 1270613
Here is the idea: count the number of starts and subtract the number of stops. That gives the net amount at each time. The rest is just aggregation:
with e as (
select start_datetime as dte, 1 as inc
from events
union all
select end_datetime as dte, -1 as inc
from events
)
select max(concurrent)
from (select dte, sum(sum(inc)) over (order by dte) as concurrent
from e
group by dte
) e;
The subquery shows the number of overlapping events at each time.
You can get the time frame as:
select dte, next_dte, concurrent
from (select dte, sum(sum(inc)) over (order by dte) as concurrent,
lead(dte) over (partition by dte) as next_dte
from e
group by dte
) e
order by concurrent desc
fetch first 1 row only;
Upvotes: 9
Reputation:
I am not entirely sure how the id
and source_id
column should be treated, but from your description, something like that maybe:
select e1.source_id,
count(distinct e2.source_id) as overlap_count,
array_agg(e2.source_id) as overlap_events
from events e1
join events e2
on e1.source_id <> e2.source_id
and (e1.start_datetime, e1.end_datetime) overlaps (e2.start_datetime, e2.end_datetime)
group by e1.source_id
order by overlap_count desc;
Given your sample data, that returns the following:
source_id | overlap_count | overlap_events
----------+---------------+---------------
23 | 2 | {42,11}
11 | 1 | {23}
42 | 1 | {23}
To only get the maximum row, you could add a limit 1
to the query.
Another (probably slower) option, if you need the complete row from the events table:
select e1.id, e1.source_id, e1.start_datetime, e1.end_datetime,
(select count(*)
from events e2
where e2.source_id <> e1.source_id
and (e1.start_datetime, e1.end_datetime) overlaps (e2.start_datetime, e2.end_datetime)
) as overlap_count
from events e1
order by overlap_count desc;
Another option is to use range types and the &&
operator instead of overlaps
:
select e1.source_id,
count(distinct e2.source_id) as overlap_count,
array_agg(e2.source_id) as overlap_events
from events e1
join events e2 on e1.source_id <> e2.source_id
and tsrange(e1.start_datetime, e1.end_datetime,'[]') && tsrange(e2.start_datetime, e2.end_datetime, '[]')
group by e1.source_id
order by overlap_count desc;
Upvotes: 1