Zach Conrad
Zach Conrad

Reputation: 21

PostgreSQL query for determining cohorts of data by timestamp and name

I have data that has a timestamp and a name like:

ts name
2021-10-01 00:00:00 Place A (123)
2021-10-01 00:01:00 Place A (123)
2021-10-01 00:06:00 Place A (123)
2021-10-01 00:10:00 Place B (234)
2021-10-01 00:13:00 Place B (234)
2021-10-01 00:15:00 Place C (345)
2021-10-01 00:18:00 Place C (345)
2021-10-01 00:23:00 Place C (345)
2021-10-01 00:27:00 Place C (345)
2021-10-01 00:28:00 Place C (345)
2021-10-01 00:29:00 Place C (345)
2021-10-01 00:30:00 Place A (123)
2021-10-01 00:33:00 Place A (123)

I am wanting to build a query that will allow for finding "cohorts" of process cycle sessions where:

The end result should be something like:

min_data_timestamp max_data_timestamp name
2021-10-01 00:00:00 2021-10-01 00:06:00 Place A (123)
2021-10-01 00:10:00 2021-10-01 00:13:00 Place B (123)
2021-10-01 00:15:00 2021-10-01 00:29:00 Place C (123)
2021-10-01 00:30:00 2021-10-01 00:33:00 Place A (123)

I am assuming some type of windowing query/CTE to do this. I have seen other examples that find start/end time overall for a name or something similar, but not where the name is repeated throughout.

EDIT had a few typos

Upvotes: 2

Views: 80

Answers (1)

Ajax1234
Ajax1234

Reputation: 71451

You can use row_number with group by:

with pr as (
   select row_number() over (order by id) r, id, name from processes
),
pr1 as (
   select p.*, (select sum(case when p1.r < p.r and p1.name != p.name then 1 end) from pr p1) gid from pr p
)
select min(p.id), max(p.id), max(p.name)
from pr1 p group by p.gid order by case when p.gid is null then 1 else p.gid end;

Output:

min_data_timestamp max_data_timestamp name
2021-10-01 00:00:00 2021-10-01 00:06:00 Place A (123)
2021-10-01 00:10:00 2021-10-01 00:13:00 Place B (234)
2021-10-01 00:15:00 2021-10-01 00:29:00 Place C (345)
2021-10-01 00:30:00 2021-10-01 00:33:00 Place A (123)

Upvotes: 1

Related Questions