Reputation: 85
I have a table of events with a project and a timestamp. I want to query all series of consecutive projects. If a project occurs consecutively more than once, the project should be listed several times. I would also like to get the start and end time and the duration of each series.
Example:
| project | created_at |
|-----------|-------------------------|
| project a | 2020-05-29 10:00:00.000 |
| project a | 2020-05-29 10:00:01.167 |
| project a | 2020-05-29 10:00:03.954 |
| project a | 2020-05-29 10:00:10.055 |
| project b | 2020-05-29 10:05:00.000 |
| project b | 2020-05-29 10:06:01.049 |
| project b | 2020-05-29 10:06:30.197 |
| project a | 2020-05-29 10:07:05.167 |
| project a | 2020-05-29 10:07:18.680 |
I would like to receive the following output:
| project | start | end | duration |
|-----------|-------------------------|-------------------------|--------------|
| project a | 2020-05-29 10:00:00.000 | 2020-05-29 10:00:10.055 | 00:00:10.055 |
| project b | 2020-05-29 10:05:00.000 | 2020-05-29 10:06:30.197 | 00:01:30:197 |
| project a | 2020-05-29 10:07:05.167 | 2020-05-29 10:07:18.680 | 00:00:13.513 |
I have the following query so far:
SELECT
project,
created_at AS "Start",
Max(created_at) AS "End",
TIMEDIFF(MAX(created_at), created_at) AS "Duration"
FROM results GROUP BY project;
This gives me the following output:
| project | start | end | duration |
|-----------|-------------------------|-------------------------|--------------|
| project a | 2020-05-29 10:00:00.000 | 2020-05-29 10:07:18.680 | 00:07:18.680 |
| project b | 2020-05-29 10:05:00.000 | 2020-05-29 10:06:30.197 | 00:01:30:197 |
The problem is that I only get two outputs through the group by. This in turn messes up the start and end date and duration to be output.
Is there a way around this so that I get my desired output?
Upvotes: 0
Views: 139
Reputation: 1269443
This is an example of a gaps-and-islands problem. The difference of row numbers should do what you want:
SELECT project, MIN(created_at) as start_dt, max(created_at) as end_dt
TIMEDIFF(MAX(created_at), created_at) AS Duration
FROM (SELECT r.*,
ROW_NUMBER() OVER (PARTITION BY project ORDER BY created_at) as seqnum_p,
ROW_NUMBER() OVER (ORDER BY created_at) as seqnum
FROM results r
) r
GROUP BY project, (seqnum - seqnum_p)
ORDER BY MIN(created_at);
Upvotes: 2