Group historical data

Question

I'm stuck with the following problem and need help:

An object has properties that are calculated every day.
They are stored in a key-value historical table.
Property is mistakenly stored even if it was not changed.

I need a query that will group this data set by "actual values":

If a value was not changed during several days it is output as one row.
If value A was changed to B then back to A, then A, B, A should be output by the query (first A and second A are different date intervals).

Here is a dataset example.

with obj_val_hist as
(
select 123 obj_id, 'k_1' key, 'A' value_, to_date('01.01.2021', 'DD.MM.YYYY') start_dt, to_date('01.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('02.01.2021', 'DD.MM.YYYY') start_dt, to_date('02.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('03.01.2021', 'DD.MM.YYYY') start_dt, to_date('03.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('04.01.2021', 'DD.MM.YYYY') start_dt, to_date('04.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('05.01.2021', 'DD.MM.YYYY') start_dt, to_date('05.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('06.01.2021', 'DD.MM.YYYY') start_dt, to_date('06.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('07.01.2021', 'DD.MM.YYYY') start_dt, to_date('07.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('08.01.2021', 'DD.MM.YYYY') start_dt, to_date('08.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('09.01.2021', 'DD.MM.YYYY') start_dt, to_date('09.01.2021', 'DD.MM.YYYY') end_dt from dual 
)
select * from obj_val_hist where obj_id = 123;

Data set:

obj_id	key	value	start_date	end_date
123	k_1	A	01.01.2021	01.01.2021
123	k_1	A	02.01.2021	02.01.2021
123	k_1	A	03.01.2021	03.01.2021
123	k_1	B	04.01.2021	04.01.2021
123	k_1	B	05.01.2021	05.01.2021
123	k_1	B	06.01.2021	06.01.2021
123	k_1	A	07.01.2021	07.01.2021
123	k_1	A	08.01.2021	08.01.2021
123	k_1	A	09.01.2021	09.01.2021

Expected result:

obj_id	key	value	start_date	end_date
123	k_1	A	01.01.2021	03.01.2021
123	k_1	B	04.01.2021	06.01.2021
123	k_1	A	07.01.2021	09.01.2021

This table contains values for million objects.
It is queried by obj_id and has an index on it.
Performance is a key point so using stored functions is most probably not an option.
This query will be a small part of a big view that is used by an external system.

I expected that there should be an analytic function suited for such a problem.
Something like dense_rank but with the possibility to order by one column (start_dt) but increase value when another column (value_) gets a different value. But I didn't find one.

astentx · Accepted Answer

You may use match_recognize for this, which can also handle gaps in dates and is quite efficient and natural to read:

create table t (
  obj_id
  , key_
  , value_
  , start_date
  , end_date
)
as
select 123, 'k_1', 'A', to_date('01.01.2021', 'dd.mm.yyyy'), to_date('01.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('02.01.2021', 'dd.mm.yyyy'), to_date('02.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('03.01.2021', 'dd.mm.yyyy'), to_date('03.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('04.01.2021', 'dd.mm.yyyy'), to_date('04.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('05.01.2021', 'dd.mm.yyyy'), to_date('05.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('06.01.2021', 'dd.mm.yyyy'), to_date('06.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('07.01.2021', 'dd.mm.yyyy'), to_date('07.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('08.01.2021', 'dd.mm.yyyy'), to_date('08.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('09.01.2021', 'dd.mm.yyyy'), to_date('09.01.2021', 'dd.mm.yyyy') from dual union all
/*Let's skip 10.01*/
select 123, 'k_1', 'A', to_date('11.01.2021', 'dd.mm.yyyy'), to_date('11.01.2021', 'dd.mm.yyyy') from dual union all
/*And extent validity period for some record*/
select 123, 'k_1', 'A', to_date('12.01.2021', 'dd.mm.yyyy'), to_date('13.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('14.01.2021', 'dd.mm.yyyy'), to_date('14.01.2021', 'dd.mm.yyyy') from dual

select *
from t
match_recognize (
  /*For each ID and KEY*/
  partition by obj_id, key_
  
  order by start_date asc
  /*Output attributes*/
  measures
    /*start_date of the first row in match group*/
    final first(start_date) as min_start_date,
    /*end_date of the last row in match group*/
    final last(end_date) as max_end_date,
    /*value itself as it is constant for match group*/
    value_ as val
  /*First row and any consequtive matches*/
  pattern (init A*)
  define
    /*Consequtive are the rows which have the same value in value_ field
      and start_date of the next row is not farther than
      1 day from end_date of the previous row
    */
    A as prev(value_) = value_
      and prev(end_date) + 1 = start_date
)

OBJ_ID | KEY_ | MIN_START_DATE | MAX_END_DATE | VAL
-----: | :--- | :------------- | :----------- | :--
   123 | k_1  | 01-JAN-21      | 03-JAN-21    | A  
   123 | k_1  | 04-JAN-21      | 06-JAN-21    | B  
   123 | k_1  | 07-JAN-21      | 09-JAN-21    | A  
   123 | k_1  | 11-JAN-21      | 14-JAN-21    | A

db<>fiddle here

Group historical data

Answers (2)

Related Questions