Lukas
Lukas

Reputation: 85

Query a series of consecutive events in MySQL

I have a table of events with a project and a timestamp. I want to query all series of consecutive projects. If a project occurs consecutively more than once, the project should be listed several times. I would also like to get the start and end time and the duration of each series.

Example:

| project   | created_at              |
|-----------|-------------------------|
| project a | 2020-05-29 10:00:00.000 |
| project a | 2020-05-29 10:00:01.167 |
| project a | 2020-05-29 10:00:03.954 |
| project a | 2020-05-29 10:00:10.055 |
| project b | 2020-05-29 10:05:00.000 |
| project b | 2020-05-29 10:06:01.049 |
| project b | 2020-05-29 10:06:30.197 |
| project a | 2020-05-29 10:07:05.167 |
| project a | 2020-05-29 10:07:18.680 |

I would like to receive the following output:

| project   | start                   | end                     | duration     |
|-----------|-------------------------|-------------------------|--------------|
| project a | 2020-05-29 10:00:00.000 | 2020-05-29 10:00:10.055 | 00:00:10.055 |
| project b | 2020-05-29 10:05:00.000 | 2020-05-29 10:06:30.197 | 00:01:30:197 |
| project a | 2020-05-29 10:07:05.167 | 2020-05-29 10:07:18.680 | 00:00:13.513 |

I have the following query so far:

SELECT 
project,
created_at AS "Start", 
Max(created_at) AS "End", 
TIMEDIFF(MAX(created_at), created_at) AS "Duration"
FROM results GROUP BY project;

This gives me the following output:

| project   | start                   | end                     | duration     |
|-----------|-------------------------|-------------------------|--------------|
| project a | 2020-05-29 10:00:00.000 | 2020-05-29 10:07:18.680 | 00:07:18.680 |
| project b | 2020-05-29 10:05:00.000 | 2020-05-29 10:06:30.197 | 00:01:30:197 |

The problem is that I only get two outputs through the group by. This in turn messes up the start and end date and duration to be output.

Is there a way around this so that I get my desired output?

Upvotes: 0

Views: 139

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269443

This is an example of a gaps-and-islands problem. The difference of row numbers should do what you want:

SELECT project, MIN(created_at) as start_dt, max(created_at) as end_dt
       TIMEDIFF(MAX(created_at), created_at) AS Duration
FROM (SELECT r.*,
             ROW_NUMBER() OVER (PARTITION BY project ORDER BY created_at) as seqnum_p,
             ROW_NUMBER() OVER (ORDER BY created_at) as seqnum
      FROM results r
     ) r
GROUP BY project, (seqnum - seqnum_p)
ORDER BY MIN(created_at);

Upvotes: 2

Related Questions