Reputation: 85
I'm working with PostgreSQL 12, but the question is standard SQL. I have a table like this:
| timestamp | raw_value |
| ------------------------ | --------- |
| 2015-06-27T03:52:50.000Z | 0 |
| 2015-06-27T03:53:00.000Z | 0 |
| 2015-06-27T03:53:10.000Z | 1 |
| 2015-06-27T03:53:20.000Z | 1 |
| 2015-06-27T04:22:20.000Z | 1 |
| 2015-06-27T04:22:30.000Z | 0 |
| 2015-06-27T05:33:40.000Z | 1 |
| 2015-06-27T05:33:50.000Z | 1 |
I need to get the first and last timestamp of each group with raw_value = 1, i.e. needed result :
| start_time | end_time |
| ------------------------ | ------------------------ |
| 2015-06-27T03:53:10.000Z | 2015-06-27T04:22:20.000Z |
| 2015-06-27T05:33:40.000Z | 2015-06-27T05:33:50.000Z |
My best effort so far looks like this:
SELECT timestamp, raw_value, row_number() over w as rn, first_value(obt) OVER w AS start_time, last_value(obt) OVER w AS end_time
FROM mytable
WINDOW w AS (PARTITION BY raw_value ORDER BY timestamp GROUPS CURRENT ROW )
ORDER BY timestamp;
Google doesn't have much info about it, but according to the docs the "GROUPS" clause is exactly what I need, but the end result is wrong, because window functions simply copy value from the timestamp column:
| timestamp | raw_value | rn | start_time | end_time |
| ------------------------ | --------- | --- | ------------------------ | ------------------------ |
| 2015-06-27T03:52:50.000Z | 0 | 1 | 2015-06-27T03:52:50.000Z | 2015-06-27T03:52:50.000Z |
| 2015-06-27T03:53:00.000Z | 0 | 2 | 2015-06-27T03:53:00.000Z | 2015-06-27T03:53:00.000Z |
| 2015-06-27T03:53:10.000Z | 1 | 1 | 2015-06-27T03:53:10.000Z | 2015-06-27T03:53:10.000Z |
| 2015-06-27T03:53:20.000Z | 1 | 2 | 2015-06-27T03:53:20.000Z | 2015-06-27T03:53:20.000Z |
| 2015-06-27T04:22:20.000Z | 1 | 3 | 2015-06-27T04:22:20.000Z | 2015-06-27T04:22:20.000Z |
| 2015-06-27T04:22:30.000Z | 0 | 3 | 2015-06-27T04:22:30.000Z | 2015-06-27T04:22:30.000Z |
| 2015-06-27T05:33:40.000Z | 1 | 4 | 2015-06-27T05:33:40.000Z | 2015-06-27T05:33:40.000Z |
| 2015-06-27T05:33:50.000Z | 1 | 5 | 2015-06-27T05:33:50.000Z | 2015-06-27T05:33:50.000Z |
At line#6 I'd expect the row number to reset to 1, but it doesn't! I tried using BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
as well without luck.
I have created a DB Fiddle link for your convenience as well.
If there is any other way to achieve the same result in SQL (ok to be PG-specific) without window functions, I'd like to know.
Upvotes: 2
Views: 444
Reputation: 12000
Identify groups using row_number() - sum()
trick, then choose min and max time for each identified group.
with grp as (
select obt, raw_value
, row_number() over w - sum(raw_value) over w as g
from tm_series
window w as (order by obt)
)
select min(obt), max(obt)
from grp
where raw_value = 1
group by g;
DB fiddle here.
(The GROUPS
clause depends on window ordering and seems to have nothing common with your problem.)
Upvotes: 2
Reputation: 12494
Your updated fiddle here.
For an gaps and islands approach, first mark your transitions from raw_value = 0
to raw_value = 1
with mark_changes as (
select obt, raw_value,
case
when raw_value = 0 then 0
when raw_value = lag(raw_value) over (order by obt) then 0
else 1
end as transition
from tm_series
),
Keep only the raw_value = 1
rows, and sum()
the preceding transition
markers to place each row into a group.
id_groups as (
select obt, raw_value,
sum(transition) over (order by obt) as grp_num
from mark_changes
where raw_value = 1
)
Use group by
on these grp_num
values to get your desired result.
select min(obt) as start_time,
max(obt) as end_time
from id_groups
group by grp_num
order by min(obt);
Upvotes: 2