Gav
Gav

Reputation: 13

Oracle 11.2 SQL - help to condense data in ordered set

I have a data-set with a timestamp column and multiple identifier columns. I want to condense it to a single row for each "block" of adjacent rows with equal identifiers, when ordered by the timestamp. The min and max timestamp for each block is required.

Source Data:

TSTAMP  ID1  ID2
t1      A    B  <= start of new block
t2      A    B
t3      C    D  <= start of new block
t4      E    F  <= start of new block
t5      E    F
t6      E    F
t7      A    B  <= start of new block
t8      G    H  <= start of new block

Desired Result:

MIN_TSTAMP  MAX_TSTAMP  ID1  ID2
t1          t2          A    B
t3          t3          C    D
t4          t6          E    F
t7          t7          A    B
t8          t8          G    H

I thought this was ripe for a window-ing analytic function but I cannot partition without grouping ALL equal combinations of IDn - rather than only those in adjacent rows, when ordered by timestamp.

A workaround is to create a key column first in an in-line view that I can later group by i.e. with same value for each row in the block and different value for each block. I can do this using LAG analytic function to compare row values and then calling a PL/SQL function to return nextval/currval values of a sequence (calling nextval/currval directly in the SQL is restricted in this context).

select min(ilv.tstamp), max(ilv.tstamp), id1, id2
from (
  select case when (id1 != lag(id1,1,'*') over (partition by (1) order by tstamp) 
                 or id2 != lag(id2,1,'*') over (partition by (1) order by tstamp))
           then
             pk_seq_utils.gav_get_nextval
           else
             pk_seq_utils.gav_get_currval
           end ident, t.*
  from tab1 t
  order by tstamp) ilv
group by ident, id1, id2
order by 1;

where the gav_get_xxx functions simply return currval/nextval from a sequence.

But I would like to use SQL only and avoid PL/SQL (as I could also write this easily in PL/SQL and pipe out the result-rows from a pipeline function).

Any ideas?

Thanks.

Upvotes: 1

Views: 207

Answers (4)

Alex Poole
Alex Poole

Reputation: 191235

You can use an analytic 'trick' to identify the gaps and islands, comparing the position of each row just against the tstamp across all rows with its position against tstamp just for that id2, id2 combination:

select tstamp, id1, id2,
  row_number() over (partition by id1, id2 order by tstamp)
    - row_number() over (order by tstamp) as block_id
from tab1;

TS I I   BLOCK_ID
-- - - ----------
t1 A B          0
t2 A B          0
t3 C D         -2
t4 E F         -3
t5 E F         -3
t6 E F         -3
t7 A B         -4
t8 G H         -7

The actual value of block_id doesn't matter, just that it's unique for each block for the combination. You can then group using that:

select min(tstamp) as min_tstamp, max(tstamp) as max_tstamp, id1, id2
from (
  select tstamp, id1, id2,
    row_number() over (partition by id1, id2 order by tstamp)
      - row_number() over (order by tstamp) as block_id
  from tab1
)
group by id1, id2, block_id
order by min(tstamp);

MI MA I I
-- -- - -
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H

Upvotes: 2

jpw
jpw

Reputation: 44871

You should be able to use the row_number window function to do this, like below:

select 
    min(tstamp) mints, max(tstamp) maxts, id1, id2
from (
    select 
       *, 
       row_number() over (order by tstamp) 
     - row_number() over (partition by id1, id2 order by tstamp) as rn
    from t
) as subq
group by id1, id2, rn
order by rn

I haven't been able to test it with any Oracle db, but it works with MSSQL and should work in Oracle too as the window function works the same way.

Upvotes: 2

Boneist
Boneist

Reputation: 23578

Tabibitosan to the rescue!

with sample_data as (select 't1' tstamp, 'A' id1, 'B' id2 from dual union all
                     select 't2' tstamp, 'A' id1, 'B' id2 from dual union all
                     select 't3' tstamp, 'C' id1, 'D' id2 from dual union all
                     select 't4' tstamp, 'E' id1, 'F' id2 from dual union all
                     select 't5' tstamp, 'E' id1, 'F' id2 from dual union all
                     select 't6' tstamp, 'E' id1, 'F' id2 from dual union all
                     select 't7' tstamp, 'A' id1, 'B' id2 from dual union all
                     select 't8' tstamp, 'G' id1, 'H' id2 from dual)
select   min(tstamp) min_tstamp, max(tstamp) max_tstamp, id1, id2
from     (select tstamp,
                 id1,
                 id2,
                 row_number() over (order by tstamp) - row_number() over (partition by id1, id2 order by tstamp) grp
          from   sample_data)
group by id1,
         id2,
         grp
order by min(tstamp);

MIN_TSTAMP MAX_TSTAMP ID1 ID2
---------- ---------- --- ---
t1         t2         A   B  
t3         t3         C   D  
t4         t6         E   F  
t7         t7         A   B  
t8         t8         G   H  

Upvotes: 4

Thorsten Kettner
Thorsten Kettner

Reputation: 94859

You need to do this step by step:

  1. Detect ID changes with LAG marking each change with a flag = 1.
  2. Generate keys for the groups (i.e. adjacent records with the same ID) with SUM over the ID change flags (running total).
  3. Group by generated group key and get min/max timestamp.

Query:

select 
  min(tstamp) as min_tstamp,
  max(tstamp) as max_tstamp,
  min(id1) as id1,
  min(id2) as id2
from
(
  select 
    grouped.*, 
    sum(newgroup) over (order by tstamp) as groupkey
  from
  (
    select 
      mytable.*, 
      case when id1 <> lag(id1) over (order by tstamp) 
             or id2 <> lag(id2) over (order by tstamp) 
      then 1 else 0 end as newgroup 
    from mytable
    order by tstamp
  ) grouped
)
group by groupkey
order by groupkey;

Upvotes: 1

Related Questions