Reputation: 33
I have a table with status and date for every day and I'm trying to find out when the statuses change and if there's gaps within each status change / how many days were of a certain status.
+------+----------------+--------+
| user | date | status |
+----- +----------------+--------+
| 1 | 12/01/2017 | open |
| 1 | 12/02/2017 | open |
| 1 | 12/03/2017 | open |
| 1 | 12/04/2017 | closed |
| 1 | 12/05/2017 | closed |
| 1 | 12/06/2017 | open |
| 1 | 12/07/2017 | open |
+------+----------------+--------+
Expected output:
+------+------------+----------------+-------- ----+------------+
| user | status | days_in_status | min | max |
+----- +------------+----------------+-------------+------------+
| 1 | open | 3 | 12/01/2017 | 12/03/2017 |
| 1 | closed | 2 | 12/04/2017 | 12/05/2017 |
| 1 | open | 2 | 12/06/2017 | 12/07/2017 |
+------+------ -----+----------------+-------------+-- ---------+
Upvotes: 3
Views: 2529
Reputation: 153832
Fully contained example borrowing from Gordon's answer above that shows island widths as well as island gaps, copy and paste into postgresql and run it.
drop table if exists foobar;
CREATE TABLE foobar( tick text, date_val date );
insert into foobar values('XYZ', '2021-01-03'); --island 1 has width 2
insert into foobar values('XYZ', '2021-01-04'); --island 1
insert into foobar values('XYZ', '2021-05-09'); --island 2 has width 3
insert into foobar values('XYZ', '2021-05-10'); --island 2
insert into foobar values('XYZ', '2021-05-11'); --island 2
insert into foobar values('XYZ', '2021-07-07'); --island 3 has width 4
insert into foobar values('XYZ', '2021-07-08'); --island 3
insert into foobar values('XYZ', '2021-07-09'); --island 3
insert into foobar values('XYZ', '2021-07-10'); --island 3
insert into foobar values('XYZ', '2022-10-10'); --island 4 has width 1
select tick, island_width, min_val, max_val,
min_val - lag(max_val) over (order by max_val) as gap_width from
(
select tick, count(*) as island_width,
min(date_val) min_val, max(date_val) max_val
from (
select t.*,
row_number() over ( partition by tick order by date_val ) as seqnum
from foobar t where tick = 'XYZ'
) t
group by tick, date_val - seqnum * interval '1 day'
) t2 order by max_val desc
Prints:
┌──────┬──────────────┬────────────┬────────────┬───────────┐
│ tick │ island_width │ min_val │ max_val │ gap_width │
├──────┼──────────────┼────────────┼────────────┼───────────┤
│ XYZ │ 1 │ 2022-10-10 │ 2022-10-10 │ 457 │
│ XYZ │ 4 │ 2021-07-07 │ 2021-07-10 │ 57 │
│ XYZ │ 3 │ 2021-05-09 │ 2021-05-11 │ 125 │
│ XYZ │ 2 │ 2021-01-03 │ 2021-01-04 │ ¤ │
└──────┴──────────────┴────────────┴────────────┴───────────┘
Details: In this data there are 4 islands. There are 3 gaps between the 4 islands. The last island has a gap. The first island has a null gap.
The SQL finds the days between every min_date
and max_date
by subtracting a day and decrementing rownumber until all dates are accounted for. partition
somehow groups the islands by date and a 2 day moving window using over
and lag
with default lookbehind 1. Black Box Demon magic aside it goes fast enough and all unit tests pass.
With help from walkthrough here: https://bertwagner.com/posts/gaps-and-islands
drop table if exists foobar;
CREATE TABLE foobar( tick text, date_val timestamp);
insert into foobar values('XYZ', '2021-01-03 12:00:00');
insert into foobar values('XYZ', '2021-01-03 11:00:00');
insert into foobar values('XYZ', '2021-01-03 10:00:00');
insert into foobar values('XYZ', '2021-01-03 08:00:00'); --2 hr gap
insert into foobar values('XYZ', '2021-01-03 07:00:00');
insert into foobar values('XYZ', '2021-01-03 06:00:00');
insert into foobar values('XYZ', '2021-01-03 03:00:00'); --3 hr gap
select tick, island_width, min_val, max_val,
min_val - lag(max_val) over (order by max_val) as gap_width from
(
select tick, count(*) as island_width,
min(date_val) min_val, max(date_val) max_val
from (
select t.*,
row_number() over ( partition by tick order by date_val ) as seqnum
from foobar t where tick = 'XYZ'
) t
group by tick, date_val - seqnum * interval '1 hour'
) t2 order by max_val desc
Prints:
┌──────┬──────────────┬─────────────────────┬─────────────────────┬───────────┐
│ tick │ island_width │ min_val │ max_val │ gap_width │
├──────┼──────────────┼─────────────────────┼─────────────────────┼───────────┤
│ XYZ │ 3 │ 2021-01-03 10:00:00 │ 2021-01-03 12:00:00 │ 02:00:00 │
│ XYZ │ 3 │ 2021-01-03 06:00:00 │ 2021-01-03 08:00:00 │ 03:00:00 │
│ XYZ │ 1 │ 2021-01-03 03:00:00 │ 2021-01-03 03:00:00 │ ¤ │
└──────┴──────────────┴─────────────────────┴─────────────────────┴───────────┘
Upvotes: 1
Reputation: 656321
In Postgres you can subtract integer
from date
directly:
SELECT "user", status
, count(*) AS days_in_status, min(date) AS min, max(date) AS max
FROM (
SELECT "user", status, date
, date - row_number() OVER (PARTITION BY "user", status ORDER BY date)::int AS grp_date
FROM tbl
) t
GROUP BY "user", status, grp_date
ORDER BY "user", min;
db<>fiddle here
See:
Aside: user
is a reserved word, you would avoid it as actual column name.
Upvotes: 0
Reputation: 1269503
This is a type of gaps-and-islands problem. In this case, subtracting a sequential number from each day is probably the simplest solution for identifying the "islands":
select user, status, count(*) as num_days, min(date), max(date)
from (select t.*,
row_number() over (partition by user, status order by date) as seqnum
from t
) t
group by user, status, date - seqnum * interval '1 day'
Upvotes: 7