Gideon
Gideon

Reputation: 1507

Group rows into range while also showing gap

I am in need of a database select query solution for transforming a series of breakdown details into a summarized version of it, possibly in a view. Consider the table below. It has a composite primary key of (PK_1, PK_2, PK_3, and SEQUENCE_NO).

PK_1     PK_2 PK_3 SEQUENCE_NO STATUS_CODE
======== ==== ==== =========== ===========
20200421 A    1    1           Y
20200421 A    1    2           Y
20200421 A    1    3           Y
20200421 A    1    4           N
20200421 A    1    5           Y
20200421 A    1    6           Y
20200421 A    2    7           Y
20200421 A    2    8           Y
20200421 B    3    9           Y
20200421 B    3    10          Y
20200421 B    3    11          Y
20200422 B    3    11          Y

Only including all records with the STATUS_CODE of "Y", how can I present the records in such a way consecutive records, respective to their composite primary keys, form ranges of value (indicated by SEQUENCE_FROM and SEQUENCE_TO, see below) while showing a gap that may indicate of a missing row, or a row with a STATUS_CODE with a value other than "Y"?

PK_1     PK_2 PK_3 SEQUENCE_FROM SEQUENCE_TO
======== ==== ==== ============= ===========
20200421 A    1    1             3
20200421 A    1    5             6
20200421 A    2    7             8
20200421 B    3    9             11
20200422 B    3    11            11

I used MIN and MAX but obviously it wouldn't accomodate showing the gap between ranges.

Upvotes: 1

Views: 112

Answers (3)

user5683823
user5683823

Reputation:

Here is one way to do this, using the match_recognize clause (Oracle 12.1 and higher). If I understand correctly, a "gap" exists when one or more rows with status code 'N' exist - and only when such rows are preceded and followed by 'Y' rows for the same combination of pk_1, pk_2, pk_3. So, in your sample data, there is only one such gap. You didn't explain exactly what you want to show (I assume a number, but you didn't explain how to compute it). I interpreted it to mean the difference between the starting value of the "current" sequence and the ending value of the "preceding" sequence.

with
  yourtable (pk_1, pk_2, pk_3, sequence_no, status_code) as (
    select 20200421, 'A', 1,  1, 'Y' from dual union all
    select 20200421, 'A', 1,  2, 'Y' from dual union all
    select 20200421, 'A', 1,  3, 'Y' from dual union all
    select 20200421, 'A', 1,  4, 'N' from dual union all
    select 20200421, 'A', 1,  5, 'Y' from dual union all
    select 20200421, 'A', 1,  6, 'Y' from dual union all
    select 20200421, 'A', 2,  7, 'Y' from dual union all
    select 20200421, 'A', 2,  8, 'Y' from dual union all
    select 20200421, 'B', 3,  9, 'Y' from dual union all
    select 20200421, 'B', 3, 10, 'Y' from dual union all
    select 20200421, 'B', 3, 11, 'Y' from dual union all
    select 20200422, 'B', 3, 11, 'Y' from dual
  )
select mr.*
     , sequence_from - lag(sequence_to) over (partition by pk_1, pk_2, pk_3 
                                              order by sequence_from) as gap
from   yourtable
match_recognize(
  partition by pk_1, pk_2, pk_3
  order     by sequence_no
  measures  first(sequence_no) as sequence_from
         ,  last (sequence_no) as sequence_to
  pattern   ( Y+ )
  define    Y as status_code = 'Y'
) mr
;

Output:

      PK_1 PK_2       PK_3 SEQUENCE_FROM SEQUENCE_TO        GAP    
---------- ---- ---------- ------------- ----------- ----------
  20200421 A             1             1           3           
  20200421 A             1             5           6          2
  20200421 A             2             7           8           
  20200421 B             3             9          11           
  20200422 B             3            11          11           

Upvotes: 1

zealous
zealous

Reputation: 7503

Try this, should work as expected. you can take a look at fiddle.

select
    PK_1,
    PK_2,
    PK_3,
    min(sequence_no) as SEQUENCE_FROM,
    max(sequence_no) as SEQUENCE_TO
from
(
    select
        *,
        sequence_no - row_number() over (partition by STATUS_CODE, PK_3 order by SEQUENCE_NO) as rnk
    from myTable
    where STATUS_CODE = 'Y'
) t    
group by
    PK_1,
    PK_2,
    PK_3,
    rnk

Output:

pk_1     pk_2 pk_3 sequence_from sequence_to
---------------------------------------------
20200421  A     1       1             3
20200421  A     1       5             6
20200421  A     2       7             8
20200421  B     3       9             11
20200422  B     3       11            11

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520908

This is a gaps and islands problem. Here is one way to solve with ROW_NUMBER, using the difference in row numbers method:

WITH cte AS (
    SELECT t.*, SEQUENCE_NO -
        ROW_NUMBER() OVER (PARTITION BY PK_1, PK_2, PK_3 ORDER BY SEQUENCE_NO) AS diff
    FROM yourTable t
    WHERE STATUS_CODE = 'Y'
)


SELECT
    PK_1,
    PK_2,
    PK_3,
    MIN(SEQUENCE_NO) AS SEQUENCE_FROM,
    MAX(SEQUENCE_NO) AS SEQUENCE_TO
FROM cte
GROUP BY
    PK_1,
    PK_2,
    PK_3,
    (rn1 - rn2)
ORDER BY
    PK_1,
    PK_2,
    PK_3;

screen capture of demo below

Demo

The crux of the logic being used here is that we form an on-the-fly group number for each island, within each PK_1, PK_2, PK_3 partition, by taking the difference between the sequence number and a ROW_NUMBER. This difference is guaranteed to always be unique for each island.

Upvotes: 2

Related Questions