Reputation: 83
I have a table like this (actually, is the result of a large large query):
id | date_measured | out_of_range
-----+-----------------------+--------------
3147 | 09/08/2019 20.00:00 | 1
3147 | 09/08/2019 21.00:00 | 0
3147 | 09/08/2019 22.00:00 | 0
3147 | 09/08/2019 23.00:00 | 1
3147 | 10/08/2019 00.00:00 | 1
3147 | 10/08/2019 01.00:00 | 1
3147 | 10/08/2019 02.00:00 | 0
3125 | 09/08/2019 20.00:00 | 0
3125 | 09/08/2019 21.00:00 | 1
3125 | 09/08/2019 22.00:00 | 1
3125 | 09/08/2019 23.00:00 | 0
3125 | 10/08/2019 00.00:00 | 1
3125 | 10/08/2019 01.00:00 | 1
3125 | 10/08/2019 02.00:00 | 1
and I need this result:
id | date_measured_start | date_measured_end | consecutive_out_of_range
-----+-----------------------+-----------------------+--------------------------
3147 | 09/08/2019 20.00:00 | 09/08/2019 20.00:00 | 1
3147 | 09/08/2019 23.00:00 | 10/08/2019 01.00:00 | 3
3125 | 09/08/2019 21.00:00 | 09/08/2019 22.00:00 | 2
3125 | 10/08/2019 00.00:00 | 10/08/2019 02.00:00 | 3
that is the consecutive recurrence of the value out_of_range = 1 and the relative start and end date.
I tried to use this solution but I just can't have only the consecutive 1 for the out_of_range. value.
Upvotes: 0
Views: 363
Reputation:
Here is a different application of the same method as in MT0's answer. The method is known as the "fixed differences" method (the "fixed differences", in both solutions, are the additional, computed value by which we group the data); also known as the "tabibitosan" method.
In this solution I subtract a row_number()
(appropriately modified) directly from the date, but after selecting just the rows with the flag equal to 1. This may be important if you have a very large amount of data, but only a relatively small fraction of rows have the flag equal to 1. This is because row_number()
needs to order the data, and ordering is an expensive operation. To solve the problem, we don't need to order (by date) the rows where the flag is 0 - only the rows where the flag is 1.
EDIT (based on MT0's comment below this answer)
MT0 points out, correctly, that my solution assumes something that is true in the test data posted by the OP, but not stated explicitly. Namely, that the date-times in the date_measured
column are continuous sequences of date-time, spaced at one hour intervals.
In fact, what my solution really does is this. Suppose that from the very beginning the data consisted only of the out-of-range rows (with flag equal to 1), and that the date-times in the date_measured
column were always rounded to the hour, as they are in the OP's test data. The question, then, would be to identify the sequences of rows where the times are "consecutive" (meaning one hour apart). That's what the query does.
END EDIT
I used MT0's table - from his db fiddle test. Thanks MT0!
with
tabibitosan (id, date_measured, grp) as (
select id, date_measured,
date_measured
- row_number() over (partition by id order by date_measured)
* interval '1' hour
from table_name
where out_of_range = 1
)
select id, min(date_measured) as date_measured_start,
max(date_measured) as date_measured_end,
count(*) as consecutive_out_of_range
from tabibitosan
group by id, grp
order by id, date_measured_start -- or whatever
;
ID DATE_MEASURED_START DATE_MEASURED_END CONSECUTIVE_OUT_OF_RANGE
---- ------------------- ----------------- ------------------------
3125 2019-08-09 21:00 2019-08-09 22:00 2
3125 2019-08-10 00:00 2019-08-10 02:00 3
3147 2019-08-09 20:00 2019-08-09 20:00 1
3147 2019-08-09 23:00 2019-08-10 01:00 3
Upvotes: 1
Reputation: 168361
Use the ROW_NUMBER
analytic function if give each row two incrementing numeric values - one per id
and the other per id
/out_of_range
pair. If you subtract one from the other then the resulting number will be constant within a consecutive set of rows with the same id
/out_of_range
values and you can use this to GROUP BY
:
Query:
SELECT id,
MIN( date_measured ) AS date_measured_start,
MAX( date_measured ) AS date_measured_end,
COUNT( * ) AS consecutive_out_of_range
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY date_measured )
- ROW_NUMBER() OVER ( PARTITION BY id, out_of_range ORDER BY date_measured )
AS rn
FROM table_name t
)
WHERE out_of_range = 1
GROUP BY id, rn
Output:
ID | DATE_MEASURED_START | DATE_MEASURED_END | CONSECUTIVE_OUT_OF_RANGE ---: | :------------------ | :------------------ | -----------------------: 3147 | 2019-08-09 20:00:00 | 2019-08-09 20:00:00 | 1 3147 | 2019-08-09 23:00:00 | 2019-08-10 01:00:00 | 3 3125 | 2019-08-10 00:00:00 | 2019-08-10 02:00:00 | 3 3125 | 2019-08-09 21:00:00 | 2019-08-09 22:00:00 | 2
db<>fiddle here
Upvotes: 1