Samuel
Samuel

Reputation: 29

Oracle analytical SQL - Retrieve the start, end and step of a range

I'm looking to use the Oracle analytical functions to neatly retrieve the range of a number of rows. (I guess mainly the LAG and LEAD functions here.)

For example: I have the following values: -2, -1, 0, 1, 2, 3, 5, 6, 7, 9, 12, 14

I'm looking to retrieve the values as the following rows in an elegant way.

Start_Value End_Value Step
-2 3 1
5 7 1
9 12 3
14 NULL NULL

See below for table creation scripts:

create table temp_data
(
    data_value number(16,4)
);
insert into temp_data values(-2);
insert into temp_data values(-1);
insert into temp_data values(0);
insert into temp_data values(1);
insert into temp_data values(2);
insert into temp_data values(3);
insert into temp_data values(5);
insert into temp_data values(6);
insert into temp_data values(7);
insert into temp_data values(9);
insert into temp_data values(12);
insert into temp_data values(14);

Below is my very complex SQL, which sort of works, but is impossible to read, and really ugly.

select start_value, 
       case when start_value = end_value then null else step end as step,
       case when start_value = end_value then null else end_value end as end_value
from
(
    select case when row_number > 1 then start_value + step else start_value end as start_value, step, end_value
    from
    (
        select data_value as start_value, next_step as step, lead(data_value) over (order by data_value) as end_value, rownum as row_number
        from
        (
            select data_value, next_step, prev_step
            from
            (
                select data_value, data_value - prev_value as prev_step, next_value - data_value as next_step
                from
                (
                    select data_value, lead(data_value) over (order by data_value) as next_value, lag(data_value) over (order by data_value) as prev_value
                    from
                    (
                        select distinct data_value
                        from   temp_data
                        order by data_value
                    )
                )
            )
            where prev_step is null
            or next_step is null
            or prev_step <> next_step
        )
    )
)
where start_value is not null;

It will retrieve the result as belows:

Start_Value End_Value Step
-2 3 1
5 NULL NULL
6 7 1
9 NULL NULL
12 NULL NULL
14 NULL NULL

My question is simply, is there a more elegant way to do this?

Upvotes: 0

Views: 85

Answers (1)

user5683823
user5683823

Reputation:

The problem can be formulated in a deterministic way, as follows:

Order the values in ascending order. Take the first two values (the two smallest ones), compute the difference, call it "step". Consider these two values as the first two in a possibly longer arithmetic progression (constant step between each pair of consecutive numbers). Stop the "range" as long as the next step is different from the previous one. Take note of the first value, the last value and the step for this "range".

Then, repeat for the remainder of the values in the table.

In the end, if there is exactly one value left (so you can't really "repeat"), declare that this last value is the "start", the end and the step are NULL, and we are done.

The fact that in each case we don't "stop" a range until we must is what makes this problem statement deterministic.

This can be expressed almost directly in MATCH_RECOGNIZE, as shown below.

select *
from   temp_data
match_recognize(
  order    by data_value
  measures a.data_value                       as start_value,
           last(b.data_value)                 as end_value,
           first(b.data_value) - a.data_value as step
  pattern  ( a b* )
  define   b as
             data_value - prev(data_value) = first(b.data_value) - a.data_value
);

START_VALUE  END_VALUE       STEP
----------- ---------- ----------
         -2          3          1
          5          7          1
          9         12          3
         14                      

EDIT

( NOTE: An earlier edit, which I have now deleted, gave an incorrect solution using analytic functions. If anyone wants to see it, that is still possible, by following the "Edited" link at the bottom.)

Here is a different solution, using the recursive subquery factoring feature added in Oracle 11.2. It essentially mimics what match_recognize does. The CLS column created in the recursive subquery corresponds to the CLASSIFIER() function in match_recognize, and GRP corresponds to MATCH_NUMBER(). However, one doesn't need to know anything about match_recognize to understand the recursive solution shown below.

with
  p (rn, val) as (
    select row_number() over (order by data_value), data_value from temp_data
  )
, r (rn, val, step, cls, grp) as (
    select  1, val, null, 'A', 1
      from  p
      where rn = 1
    union all
    select  p.rn, p.val, p.val - r.val,
            case when r.cls = 'A' or p.val = r.val + r.step 
                 then 'B' else 'A' end,
            case when r.cls = 'A' or p.val = r.val + r.step 
                 then r.grp else r.grp + 1 end
      from  p join r on p.rn = r.rn + 1
  )
select min(val) as start_value, 
       max(case cls when 'B' then val end) as end_value,
       min(case cls when 'B' then step end) as step
from   r
group  by grp
order  by grp
;

Upvotes: 1

Related Questions