Sagiv
Sagiv

Reputation: 37

Grouping rows by value until it changes (grouping includes the first changed value)

I have the following dataset:

enter image description here

The rows are sorted in ascending order by the start_time field, and I want to group rows that have a sequence of false values, until the first true value, including the first true value.

That is, for the above dataset I want the following output:

enter image description here

(the grouping field can include other values than what I wrote)

Upvotes: 0

Views: 488

Answers (2)

marcothesane
marcothesane

Reputation: 6741

With Vertica, you would write a much better readable query with Vertica's lovely CONDITIONAL_TRUE_EVENT() function, which is an analytic function that starts with 0 at each PARTITION BY expression and increments by 1 every time the Boolean expression is true.

You need an increment every time you have a gap of more than 1 day or your preceding row was at TRUE as well as your current row. So:

WITH
-- your input ...
indata(start_time,bool) AS (
           SELECT TIMESTAMP '2020-10-12 08:00',FALSE
 UNION ALL SELECT TIMESTAMP '2020-10-12 08:04',FALSE
 UNION ALL SELECT TIMESTAMP '2020-10-12 08:08',TRUE
 UNION ALL SELECT TIMESTAMP '2020-10-12 08:18',TRUE
 UNION ALL SELECT TIMESTAMP '2020-12-10 08:30',FALSE
 UNION ALL SELECT TIMESTAMP '2020-12-10 08:31',FALSE
 UNION ALL SELECT TIMESTAMP '2020-12-10 08:34',FALSE
 UNION ALL SELECT TIMESTAMP '2020-12-10 08:38',FALSE
)
SELECT
  *
, CONDITIONAL_TRUE_EVENT(
      start_time - LAG(start_time) > INTERVAL '1 DAY'
   OR (bool AND LAG(bool) )
  ) OVER(
    PARTITION BY 1 ORDER BY start_time
  ) + 1
  AS sessid
FROM indata;
-- out start_time         |bool |sessid
-- out 2020-10-12 08:00:00|false|     1
-- out 2020-10-12 08:04:00|false|     1
-- out 2020-10-12 08:08:00|true |     1
-- out 2020-10-12 08:18:00|true |     2
-- out 2020-12-10 08:30:00|false|     3
-- out 2020-12-10 08:31:00|false|     3
-- out 2020-12-10 08:34:00|false|     3
-- out 2020-12-10 08:38:00|false|     3

Upvotes: 0

GMB
GMB

Reputation: 222492

I think a window sum can do what you want:

select t.*, 
    1 + coalesce(sum(case when bool = true then 1 else 0 end) over(
            order by start_time
            rows between unbounded preceding and 1 preceding
        ), 0) as grp
from mytable t

Upvotes: 3

Related Questions