Reputation: 21
I have data that has a timestamp and a name like:
ts | name |
---|---|
2021-10-01 00:00:00 | Place A (123) |
2021-10-01 00:01:00 | Place A (123) |
2021-10-01 00:06:00 | Place A (123) |
2021-10-01 00:10:00 | Place B (234) |
2021-10-01 00:13:00 | Place B (234) |
2021-10-01 00:15:00 | Place C (345) |
2021-10-01 00:18:00 | Place C (345) |
2021-10-01 00:23:00 | Place C (345) |
2021-10-01 00:27:00 | Place C (345) |
2021-10-01 00:28:00 | Place C (345) |
2021-10-01 00:29:00 | Place C (345) |
2021-10-01 00:30:00 | Place A (123) |
2021-10-01 00:33:00 | Place A (123) |
I am wanting to build a query that will allow for finding "cohorts" of process cycle sessions where:
The end result should be something like:
min_data_timestamp | max_data_timestamp | name |
---|---|---|
2021-10-01 00:00:00 | 2021-10-01 00:06:00 | Place A (123) |
2021-10-01 00:10:00 | 2021-10-01 00:13:00 | Place B (123) |
2021-10-01 00:15:00 | 2021-10-01 00:29:00 | Place C (123) |
2021-10-01 00:30:00 | 2021-10-01 00:33:00 | Place A (123) |
I am assuming some type of windowing query/CTE to do this. I have seen other examples that find start/end time overall for a name or something similar, but not where the name is repeated throughout.
EDIT had a few typos
Upvotes: 2
Views: 80
Reputation: 71451
You can use row_number
with group by
:
with pr as (
select row_number() over (order by id) r, id, name from processes
),
pr1 as (
select p.*, (select sum(case when p1.r < p.r and p1.name != p.name then 1 end) from pr p1) gid from pr p
)
select min(p.id), max(p.id), max(p.name)
from pr1 p group by p.gid order by case when p.gid is null then 1 else p.gid end;
Output:
min_data_timestamp | max_data_timestamp | name |
---|---|---|
2021-10-01 00:00:00 | 2021-10-01 00:06:00 | Place A (123) |
2021-10-01 00:10:00 | 2021-10-01 00:13:00 | Place B (234) |
2021-10-01 00:15:00 | 2021-10-01 00:29:00 | Place C (345) |
2021-10-01 00:30:00 | 2021-10-01 00:33:00 | Place A (123) |
Upvotes: 1