Reputation: 37
I have the following dataset:
The rows are sorted in ascending order by the start_time field, and I want to group rows that have a sequence of false values, until the first true value, including the first true value.
That is, for the above dataset I want the following output:
(the grouping field can include other values than what I wrote)
Upvotes: 0
Views: 488
Reputation: 6741
With Vertica, you would write a much better readable query with Vertica's lovely CONDITIONAL_TRUE_EVENT()
function, which is an analytic function that starts with 0 at each PARTITION BY
expression and increments by 1 every time the Boolean expression is true.
You need an increment every time you have a gap of more than 1 day or your preceding row was at TRUE
as well as your current row. So:
WITH
-- your input ...
indata(start_time,bool) AS (
SELECT TIMESTAMP '2020-10-12 08:00',FALSE
UNION ALL SELECT TIMESTAMP '2020-10-12 08:04',FALSE
UNION ALL SELECT TIMESTAMP '2020-10-12 08:08',TRUE
UNION ALL SELECT TIMESTAMP '2020-10-12 08:18',TRUE
UNION ALL SELECT TIMESTAMP '2020-12-10 08:30',FALSE
UNION ALL SELECT TIMESTAMP '2020-12-10 08:31',FALSE
UNION ALL SELECT TIMESTAMP '2020-12-10 08:34',FALSE
UNION ALL SELECT TIMESTAMP '2020-12-10 08:38',FALSE
)
SELECT
*
, CONDITIONAL_TRUE_EVENT(
start_time - LAG(start_time) > INTERVAL '1 DAY'
OR (bool AND LAG(bool) )
) OVER(
PARTITION BY 1 ORDER BY start_time
) + 1
AS sessid
FROM indata;
-- out start_time |bool |sessid
-- out 2020-10-12 08:00:00|false| 1
-- out 2020-10-12 08:04:00|false| 1
-- out 2020-10-12 08:08:00|true | 1
-- out 2020-10-12 08:18:00|true | 2
-- out 2020-12-10 08:30:00|false| 3
-- out 2020-12-10 08:31:00|false| 3
-- out 2020-12-10 08:34:00|false| 3
-- out 2020-12-10 08:38:00|false| 3
Upvotes: 0
Reputation: 222492
I think a window sum can do what you want:
select t.*,
1 + coalesce(sum(case when bool = true then 1 else 0 end) over(
order by start_time
rows between unbounded preceding and 1 preceding
), 0) as grp
from mytable t
Upvotes: 3