FreshTransistor
FreshTransistor

Reputation: 109

Increment by one when value changes

I'm unable to understand this part

this is the data

WITH data AS (
              SELECT 'person1' person, 'day1' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day2' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day3' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day4' day, 'N' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day5' day, 'N' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day6' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day7' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day8' day, 'Y' flag FROM dual
   )

And when I run below

 WITH data AS (
              SELECT 'person1' person, 'day1' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day2' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day3' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day4' day, 'N' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day5' day, 'N' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day6' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day7' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day8' day, 'Y' flag FROM dual
   )                     
    SELECT person, DAY, flag,
                 CASE WHEN flag = lag(flag) over (PARTITION BY person
                                                   ORDER BY DAY)
                       THEN 0
                       ELSE 1
                  END gap
             FROM DATA; 

give me this output

person1 day1    Y   1
person1 day2    Y   0
person1 day3    Y   0
person1 day4    N   1
person1 day5    N   0
person1 day6    Y   1
person1 day7    Y   0
person1 day8    Y   0

Upto here I understand. Want to know when I add below part in above query

SELECT person, DAY, flag, SUM(gap) over (PARTITION BY person
                                            ORDER BY DAY) grp

Like this

WITH data AS (
              SELECT 'person1' person, 'day1' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day2' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day3' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day4' day, 'N' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day5' day, 'N' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day6' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day7' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day8' day, 'Y' flag FROM dual
   )
   
   SELECT person, DAY, flag, SUM(gap) over (PARTITION BY person
                                            ORDER BY DAY) grp
   From
   ( SELECT person, DAY, flag,
                 CASE WHEN flag = lag(flag) over (PARTITION BY person
                                                   ORDER BY DAY)
                       THEN 0
                       ELSE 1
                  END gap
             FROM DATA) ;

How come its giving result like this ? Its doing sum over all partitions of person1, how value is being summed here ? Can anyone help me to understand here ? Not able to understand how the 4th column is derived.

person1 day1    Y   1
person1 day2    Y   1
person1 day3    Y   1
person1 day4    N   2
person1 day5    N   2
person1 day6    Y   3
person1 day7    Y   3
person1 day8    Y   3

Upvotes: 2

Views: 1648

Answers (3)

MT0
MT0

Reputation: 167822

From the analytic functions documentation:

If you omit the windowing_clause entirely, then the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

So your query is effectively:

WITH data AS (
              SELECT 'person1' person, 'day1' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day2' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day3' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day4' day, 'N' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day5' day, 'N' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day6' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day7' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day8' day, 'Y' flag FROM dual
)
SELECT person,
       DAY,
       flag,
       SUM(gap) over (
         PARTITION BY person
         ORDER BY DAY
         RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS grp
From   (
  SELECT person,
         DAY,
         flag,
         CASE WHEN flag = lag(flag) over (
                            PARTITION BY person
                            ORDER BY DAY
                          )
         THEN 0
         ELSE 1
         END AS gap
  FROM   DATA
);

This means that, for each row (for each person ordered by day), you are performing the SUM of the changes in the flag only over that day and all the preceding days.

So:

  • When day is day1 the the SUM is only over the change in flags for the day1 row.
  • When day is day2 the the SUM is over the change in flags for the day1 and day2 rows.
  • When day is day3 the the SUM is over the change in flags for the day1, day2 and day3 rows.
  • etc.

From Oracle 12, you can write the query in a simpler format using MATCH_RECOGNIZE:

WITH data AS (
              SELECT 'person1' person, 'day1' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day2' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day3' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day4' day, 'N' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day5' day, 'N' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day6' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day7' day, 'Y' flag FROM dual
    UNION ALL SELECT 'person1' person, 'day8' day, 'Y' flag FROM dual
)
SELECT person,
       DAY,
       flag,
       grp
FROM   DATA
MATCH_RECOGNIZE(
  PARTITION BY person
  ORDER BY day
  MEASURES
    MATCH_NUMBER() AS grp
  ALL ROWS PER MATCH
  PATTERN (same_flags+)
  DEFINE same_flags AS FIRST(flag) = flag
);

db<>fiddle here

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520918

This is a gaps and islands problem, where each island is defined as being a cluster of records belonging to the same person and flag values, as ordered by the day column. We can use the difference in row numbers method along with a few other tricks here.

WITH cte AS (
    SELECT t.*, ROW_NUMBER() OVER (PARTITION BY person ORDER BY day) rn1,
                ROW_NUMBER() OVER (PARTITION BY person, flag ORDER BY day) rn2
    FROM data t
),
cte2 AS (
    SELECT t.*, MIN(day) OVER (PARTITION BY person, rn1 - rn2) min_day
    FROM cte t
)

SELECT person, day, flag, DENSE_RANK() OVER (ORDER BY min_day) grp
FROM cte2
ORDER BY person, day;

Demo

Note that rather than having a text day column with values like day1, day2, etc., it would be better to have an actual date or timestamp column there. For the 9 records of sample data you did provide though, it happens to work.

Upvotes: 1

Jon Armstrong
Jon Armstrong

Reputation: 4694

It's probably easier to understand if we keep gap in the result. I've updated the fiddle to include gap in the result.

This is a running sum (of the gap value) over all the rows up to this row in the partition, but in the order specified. Each row in the partition (base on the order) adds to the sum.

Here's a fiddle / test case, to show different ways to order sum, and the last example removes the ORDER BY. Without the ORDER BY the sum is performed over all rows at once.

Test case:

Test case with various examples of sum order

SUM(gap) over (PARTITION BY person ORDER BY DAY) grp

enter image description here

SUM(gap) over (PARTITION BY person ORDER BY DAY DESC) grp

enter image description here

SUM(gap) over (PARTITION BY person) grp

enter image description here

Upvotes: 1

Related Questions