Reputation: 29
I'm looking to use the Oracle analytical functions to neatly retrieve the range of a number of rows. (I guess mainly the LAG and LEAD functions here.)
For example: I have the following values: -2, -1, 0, 1, 2, 3, 5, 6, 7, 9, 12, 14
I'm looking to retrieve the values as the following rows in an elegant way.
Start_Value | End_Value | Step |
---|---|---|
-2 | 3 | 1 |
5 | 7 | 1 |
9 | 12 | 3 |
14 | NULL | NULL |
See below for table creation scripts:
create table temp_data
(
data_value number(16,4)
);
insert into temp_data values(-2);
insert into temp_data values(-1);
insert into temp_data values(0);
insert into temp_data values(1);
insert into temp_data values(2);
insert into temp_data values(3);
insert into temp_data values(5);
insert into temp_data values(6);
insert into temp_data values(7);
insert into temp_data values(9);
insert into temp_data values(12);
insert into temp_data values(14);
Below is my very complex SQL, which sort of works, but is impossible to read, and really ugly.
select start_value,
case when start_value = end_value then null else step end as step,
case when start_value = end_value then null else end_value end as end_value
from
(
select case when row_number > 1 then start_value + step else start_value end as start_value, step, end_value
from
(
select data_value as start_value, next_step as step, lead(data_value) over (order by data_value) as end_value, rownum as row_number
from
(
select data_value, next_step, prev_step
from
(
select data_value, data_value - prev_value as prev_step, next_value - data_value as next_step
from
(
select data_value, lead(data_value) over (order by data_value) as next_value, lag(data_value) over (order by data_value) as prev_value
from
(
select distinct data_value
from temp_data
order by data_value
)
)
)
where prev_step is null
or next_step is null
or prev_step <> next_step
)
)
)
where start_value is not null;
It will retrieve the result as belows:
Start_Value | End_Value | Step |
---|---|---|
-2 | 3 | 1 |
5 | NULL | NULL |
6 | 7 | 1 |
9 | NULL | NULL |
12 | NULL | NULL |
14 | NULL | NULL |
My question is simply, is there a more elegant way to do this?
Upvotes: 0
Views: 85
Reputation:
The problem can be formulated in a deterministic way, as follows:
Order the values in ascending order. Take the first two values (the two smallest ones), compute the difference, call it "step". Consider these two values as the first two in a possibly longer arithmetic progression (constant step between each pair of consecutive numbers). Stop the "range" as long as the next step is different from the previous one. Take note of the first value, the last value and the step for this "range".
Then, repeat for the remainder of the values in the table.
In the end, if there is exactly one value left (so you can't really "repeat"), declare that this last value is the "start", the end and the step are NULL, and we are done.
The fact that in each case we don't "stop" a range until we must is what makes this problem statement deterministic.
This can be expressed almost directly in MATCH_RECOGNIZE, as shown below.
select *
from temp_data
match_recognize(
order by data_value
measures a.data_value as start_value,
last(b.data_value) as end_value,
first(b.data_value) - a.data_value as step
pattern ( a b* )
define b as
data_value - prev(data_value) = first(b.data_value) - a.data_value
);
START_VALUE END_VALUE STEP
----------- ---------- ----------
-2 3 1
5 7 1
9 12 3
14
EDIT
( NOTE: An earlier edit, which I have now deleted, gave an incorrect solution using analytic functions. If anyone wants to see it, that is still possible, by following the "Edited" link at the bottom.)
Here is a different solution, using the recursive subquery factoring feature added in Oracle 11.2. It essentially mimics what match_recognize
does. The CLS
column created in the recursive subquery corresponds to the CLASSIFIER()
function in match_recognize
, and GRP
corresponds to MATCH_NUMBER()
. However, one doesn't need to know anything about match_recognize
to understand the recursive solution shown below.
with
p (rn, val) as (
select row_number() over (order by data_value), data_value from temp_data
)
, r (rn, val, step, cls, grp) as (
select 1, val, null, 'A', 1
from p
where rn = 1
union all
select p.rn, p.val, p.val - r.val,
case when r.cls = 'A' or p.val = r.val + r.step
then 'B' else 'A' end,
case when r.cls = 'A' or p.val = r.val + r.step
then r.grp else r.grp + 1 end
from p join r on p.rn = r.rn + 1
)
select min(val) as start_value,
max(case cls when 'B' then val end) as end_value,
min(case cls when 'B' then step end) as step
from r
group by grp
order by grp
;
Upvotes: 1