sazary
sazary

Reputation: 966

How to get maximum number of concurrent events in postgresql?

I have a table named events like this:

id: int
source_id: int
start_datetime: timestamp
end_datetime: timestamp  

These events could have overlaps, and I want to know maximum number of overlapping events that have occurred over a time span. For example, in a situation like this:

id | source_id | start_datetime     | end_datetime
----------------------------------------------------------
1  | 23        | 2017-1-1T10:20:00  | 2017-1-1T10:40:00
1  | 42        | 2017-1-1T10:30:00  | 2017-1-1T10:35:00
1  | 11        | 2017-1-1T10:37:00  | 2017-1-1T10:50:00  

The answer is 2, because at most 2 events overlap at 10:30 until 10:35.
I'm using Postgres 9.6

Upvotes: 4

Views: 1533

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270613

Here is the idea: count the number of starts and subtract the number of stops. That gives the net amount at each time. The rest is just aggregation:

with e as (
      select start_datetime as dte, 1 as inc
      from events
      union all
      select end_datetime as dte, -1 as inc
      from events
     )
select max(concurrent)
from (select dte, sum(sum(inc)) over (order by dte) as concurrent
      from e
      group by dte
     ) e;

The subquery shows the number of overlapping events at each time.

You can get the time frame as:

select dte, next_dte, concurrent
from (select dte, sum(sum(inc)) over (order by dte) as concurrent,
             lead(dte) over (partition by dte) as next_dte
      from e
      group by dte
     ) e
order by concurrent desc
fetch first 1 row only;

Upvotes: 9

user330315
user330315

Reputation:

I am not entirely sure how the id and source_id column should be treated, but from your description, something like that maybe:

select e1.source_id, 
       count(distinct e2.source_id) as overlap_count, 
       array_agg(e2.source_id) as overlap_events
from events e1
  join events e2 
    on e1.source_id <> e2.source_id
   and (e1.start_datetime, e1.end_datetime) overlaps (e2.start_datetime, e2.end_datetime) 
group by e1.source_id
order by overlap_count desc;

Given your sample data, that returns the following:

source_id | overlap_count | overlap_events
----------+---------------+---------------
       23 |             2 | {42,11}       
       11 |             1 | {23}          
       42 |             1 | {23}          

To only get the maximum row, you could add a limit 1 to the query.

Another (probably slower) option, if you need the complete row from the events table:

select e1.id, e1.source_id, e1.start_datetime, e1.end_datetime, 
       (select count(*)
        from events e2
        where e2.source_id <> e1.source_id
          and (e1.start_datetime, e1.end_datetime) overlaps (e2.start_datetime, e2.end_datetime)
       )  as overlap_count
from events e1
order by overlap_count desc;

Another option is to use range types and the && operator instead of overlaps:

select e1.source_id, 
       count(distinct e2.source_id) as overlap_count, 
       array_agg(e2.source_id) as overlap_events
from events e1
  join events e2 on e1.source_id <> e2.source_id
             and tsrange(e1.start_datetime, e1.end_datetime,'[]') && tsrange(e2.start_datetime, e2.end_datetime, '[]') 
group by e1.source_id
order by overlap_count desc;

Upvotes: 1

Related Questions