Reputation: 109
I'm unable to understand this part
this is the data
WITH data AS (
SELECT 'person1' person, 'day1' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day2' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day3' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day4' day, 'N' flag FROM dual
UNION ALL SELECT 'person1' person, 'day5' day, 'N' flag FROM dual
UNION ALL SELECT 'person1' person, 'day6' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day7' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day8' day, 'Y' flag FROM dual
)
And when I run below
WITH data AS (
SELECT 'person1' person, 'day1' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day2' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day3' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day4' day, 'N' flag FROM dual
UNION ALL SELECT 'person1' person, 'day5' day, 'N' flag FROM dual
UNION ALL SELECT 'person1' person, 'day6' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day7' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day8' day, 'Y' flag FROM dual
)
SELECT person, DAY, flag,
CASE WHEN flag = lag(flag) over (PARTITION BY person
ORDER BY DAY)
THEN 0
ELSE 1
END gap
FROM DATA;
give me this output
person1 day1 Y 1
person1 day2 Y 0
person1 day3 Y 0
person1 day4 N 1
person1 day5 N 0
person1 day6 Y 1
person1 day7 Y 0
person1 day8 Y 0
Upto here I understand. Want to know when I add below part in above query
SELECT person, DAY, flag, SUM(gap) over (PARTITION BY person
ORDER BY DAY) grp
Like this
WITH data AS (
SELECT 'person1' person, 'day1' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day2' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day3' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day4' day, 'N' flag FROM dual
UNION ALL SELECT 'person1' person, 'day5' day, 'N' flag FROM dual
UNION ALL SELECT 'person1' person, 'day6' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day7' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day8' day, 'Y' flag FROM dual
)
SELECT person, DAY, flag, SUM(gap) over (PARTITION BY person
ORDER BY DAY) grp
From
( SELECT person, DAY, flag,
CASE WHEN flag = lag(flag) over (PARTITION BY person
ORDER BY DAY)
THEN 0
ELSE 1
END gap
FROM DATA) ;
How come its giving result like this ? Its doing sum
over all partitions of person1
, how value is being summed here ? Can anyone help me to understand here ? Not able to understand how the 4th column is derived.
person1 day1 Y 1
person1 day2 Y 1
person1 day3 Y 1
person1 day4 N 2
person1 day5 N 2
person1 day6 Y 3
person1 day7 Y 3
person1 day8 Y 3
Upvotes: 2
Views: 1648
Reputation: 167822
From the analytic functions documentation:
If you omit the windowing_clause entirely, then the default is
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
So your query is effectively:
WITH data AS (
SELECT 'person1' person, 'day1' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day2' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day3' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day4' day, 'N' flag FROM dual
UNION ALL SELECT 'person1' person, 'day5' day, 'N' flag FROM dual
UNION ALL SELECT 'person1' person, 'day6' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day7' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day8' day, 'Y' flag FROM dual
)
SELECT person,
DAY,
flag,
SUM(gap) over (
PARTITION BY person
ORDER BY DAY
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS grp
From (
SELECT person,
DAY,
flag,
CASE WHEN flag = lag(flag) over (
PARTITION BY person
ORDER BY DAY
)
THEN 0
ELSE 1
END AS gap
FROM DATA
);
This means that, for each row (for each person
ordered by day
), you are performing the SUM
of the changes in the flag
only over that day and all the preceding days.
So:
day
is day1
the the SUM
is only over the change in flags for the day1
row.day
is day2
the the SUM
is over the change in flags for the day1
and day2
rows.day
is day3
the the SUM
is over the change in flags for the day1
, day2
and day3
rows.From Oracle 12, you can write the query in a simpler format using MATCH_RECOGNIZE
:
WITH data AS (
SELECT 'person1' person, 'day1' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day2' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day3' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day4' day, 'N' flag FROM dual
UNION ALL SELECT 'person1' person, 'day5' day, 'N' flag FROM dual
UNION ALL SELECT 'person1' person, 'day6' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day7' day, 'Y' flag FROM dual
UNION ALL SELECT 'person1' person, 'day8' day, 'Y' flag FROM dual
)
SELECT person,
DAY,
flag,
grp
FROM DATA
MATCH_RECOGNIZE(
PARTITION BY person
ORDER BY day
MEASURES
MATCH_NUMBER() AS grp
ALL ROWS PER MATCH
PATTERN (same_flags+)
DEFINE same_flags AS FIRST(flag) = flag
);
db<>fiddle here
Upvotes: 0
Reputation: 520918
This is a gaps and islands problem, where each island is defined as being a cluster of records belonging to the same person and flag values, as ordered by the day column. We can use the difference in row numbers method along with a few other tricks here.
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY person ORDER BY day) rn1,
ROW_NUMBER() OVER (PARTITION BY person, flag ORDER BY day) rn2
FROM data t
),
cte2 AS (
SELECT t.*, MIN(day) OVER (PARTITION BY person, rn1 - rn2) min_day
FROM cte t
)
SELECT person, day, flag, DENSE_RANK() OVER (ORDER BY min_day) grp
FROM cte2
ORDER BY person, day;
Note that rather than having a text day
column with values like day1
, day2
, etc., it would be better to have an actual date or timestamp column there. For the 9 records of sample data you did provide though, it happens to work.
Upvotes: 1
Reputation: 4694
It's probably easier to understand if we keep gap
in the result. I've updated the fiddle to include gap
in the result.
This is a running sum (of the gap value) over all the rows up to this row in the partition, but in the order specified. Each row in the partition (base on the order) adds to the sum.
Here's a fiddle / test case, to show different ways to order sum, and the last example removes the ORDER BY
. Without the ORDER BY
the sum is performed over all rows at once.
Test case:
Test case with various examples of sum order
SUM(gap) over (PARTITION BY person ORDER BY DAY) grp
SUM(gap) over (PARTITION BY person ORDER BY DAY DESC) grp
SUM(gap) over (PARTITION BY person) grp
Upvotes: 1