TunaFishLies
TunaFishLies

Reputation: 107

How to compute frequency of concurrent events by combination in MySQL?

I am looking for a way to identify event names names that co-occur: i.e., correlate event names with the same start (startts) and end (endts) times: the events are exactly concurrent (partial overlap is not a feature of this data base).

toy dataframe

+------------------+
|name startts endts|
| A   02:20  02:23 |
| A   02:23  02:25 |
| A   02:27  02:28 |
| B   02:20  02:23 |
| B   02:23  02:25 |
| B   02:25  02:27 |
| C   02:27  02:28 |
| D   02:27  02:28 |
| D   02:28  02:31 |
| E   02:27  02:28 |
| E   02:29  02:31 |
+------------------+

Ideal output:


+---------------------------+
|combination| count         |
+---------------------------+
|  AB       | 2             |
|  AC       | 1             |
|  AE       | 1             |
|  AD       | 1             |
|  BC       | 0             |
|  BD       | 0             |
|  BE       | 0             |
|  CE       | 0             |
+-----------+---------------+

Naturally, I would have tried a loop but I recognize mysql server is not optimal for this.

What I've tried is generating a temporary table by selecting for distinct name and startts and endts combinations and then doing a left join on the table itself (selecting name).

Thank you.

Upvotes: 1

Views: 108

Answers (1)

GMB
GMB

Reputation: 222432

I understand this as a self-join, aggregation, and a conditional count of matching intervals:

select t1.name name1, t2.name name2,
    sum(t1.startts = t2.startts and t1.endts = t2.endts) cnt
from mytable t1
inner join mytable t2 on t2.name > t1.name
group by t1.name, t2.name
order by t1.name, t2.name

Demo on DB Fiddle:

name1 | name2 | cnt
:---- | :---- | --:
A     | B     |   2
A     | C     |   1
A     | D     |   1
A     | E     |   1
B     | C     |   0
B     | D     |   0
B     | E     |   0
C     | D     |   1
C     | E     |   1
D     | E     |   1

Note that, if you are looking for a count of overlapping intervals, all you have to do is change the sum() to:

sum(t1.startts <= t2.endts and t1.endts >= t2.startts) cnt

Upvotes: 1

Related Questions