Mr.P
Mr.P

Reputation: 1257

How to merge consecutive date range Oracle

I am facing a problem. I can't figure out how to merge consecutive date range rows together, based on two dimensions. One is OK for me, but second makes troubles

Let's imagine table in this structure with four possible scenarios

  emp_id  |  level  |  date_from   |   date_to    
--------------------------------------------------
    1     |   A     |  7/31/2015   |  3/31/2016
    1     |   A     |  4/1/2016    |  1/1/3000

    2     |   A     |  7/31/2015   |  1/1/3000

    3     |   A     |  5/31/2015   |  12/31/2015
    3     |   B     |  1/1/2016    |  3/31/2016
    3     |   A     |  4/1/2016    |  6/30/2016
    3     |   B     |  7/1/2016    |  1/1/3000

    4     |   A     |  5/31/2015   |  12/31/2015
    4     |   A     |  1/1/2016    |  6/30/2016
    4     |   B     |  7/1/2016    |  1/1/3000

I want to merge only those rows, that have consecutive date ranges and act_level = prev_level

I tried to do something like this

SELECT emp_id
, level
, date_from
, date_to
--
, CASE
    WHEN lag(level) over (partition by emp_id order by date_from) = level THEN 
         CASE
             WHEN lag(date_to) over (partition by emp_id, level order by date_from) = date_from-1 
               THEN lag(date_from) over (partition by code_employee, level_name order by date_from)
             ELSE NULL
         END
    ELSE 
         CASE
             WHEN lag(level) over (partition by emp_id order by date_from) = level
                     OR
                  lead(level) over (partition by emp_id order by date_from) = level
                THEN NULL
             ELSE date_from
         END
  END date_from_new
, date_to as date_to_new
--
FROM src_table
--
WHERE 1=1

this gives me nearly the results that I want:

  emp_id  |  level  |  date_from   |   date_to   |  d_from_new | d_from_to 
--------------------------------------------------------------------------
    1     |   A     |  7/31/2015   |  3/31/2016  |           | 3/31/2016
    1     |   A     |  4/1/2016    |  1/1/3000   | 7/31/2015 | 1/1/3000

    2     |   A     |  7/31/2015   |  1/1/3000   | 7/31/2015 | 1/1/3000

    3     |   A     |  5/31/2015   |  12/31/2015 | 5/31/2015 | 12/31/2015
    3     |   B     |  1/1/2016    |  3/31/2016  |  1/1/2016 | 3/31/2016
    3     |   A     |  4/1/2016    |  6/30/2016  |  4/1/2016 | 6/30/2016  
    3     |   B     |  7/1/2016    |  1/1/3000   |  7/1/2016 | 1/1/3000 

    4     |   A     |  5/31/2015   |  12/31/2015 |           | 12/31/2015
    4     |   A     |  1/1/2016    |  6/30/2016  | 5/31/2015 | 6/30/2016
    4     |   B     |  7/1/2016    |  1/1/3000   | 7/1/2016  | 1/1/3000

I will just filter the result for d_from_new (date_from_new) not null values. But I am not sure what's gonna happen if there will be for example 3x the same level with consecutive date range, or 8times.

And honestly - I don't like the query :)

Do you have any "perfomence-friendly" and "eye-friendly" solution?

Upvotes: 5

Views: 2703

Answers (2)

Chris
Chris

Reputation: 11

The following SQL statement answers what seems to be the intended question: Identify consecutive intervals (date_from, date_to) and levels lvl per emp_id and merge them into a single row (emp_id, lvl, date_from, date_to).

The trick is to assign distinct group ids to groups of successive (date_from, date_to) and lvl values. The whole story is done is 2 inline views, tab0 and tab1:

  1. diff_levels and diff_dates in tab0 are non-zero whenever we cross group boundaries
  2. ranked_levels and ranked_dates in tab1 generate cumulatives of the diff fields (thereby making the lvl and (date_from, date_to) groups distinct)
  3. the final table just groups on (ranked_levels, ranked_groups)
with 
    -- tab0:     
    tab0 as (
        select
            emp_id,
            lvl,
            date_from,
            date_to,
            case 
                when 
                    lvl != lag(lvl, 1, lvl) 
                    over(partition by emp_id order by date_from)
                then 1
                else 0
            end diff_levels,
            date_from - lag(date_to, 1, date_from) 
            over(partition by emp_id order by date_from) - 1 diff_dates
        from src_table),
    -- tab1:
    tab1 as (
        select
            emp_id,
            lvl,
            date_from,
            date_to,
            sum(diff_levels) over(
                partition by emp_id
                order by date_from
                range between unbounded preceding and current row) ranked_levels,
            sum(diff_dates) over(
                partition by emp_id
                order by date_from 
                range between unbounded preceding and current row) ranked_dates
        from tab0)
select
    emp_id,
    lvl,
    min(date_from) date_from,
    max(date_to) date_to
from tab1
group by 
    emp_id,
    lvl,
    ranked_levels,
    ranked_dates
order by emp_id, lvl;

The SQL was tested on ORACLE 11g but since it is ANSI SQL it runs everywhere.

I used the tiny table given in a previous answer:

create table src_table (emp_id number(6), lvl varchar2(2), date_from date, date_to date);
insert into src_table values (1, 'A', date '2015-07-31', date '2016-03-31');
insert into src_table values (1, 'A', date '2016-04-01', date '3000-01-01');
insert into src_table values (2, 'A', date '2015-07-31', date '3000-01-01');
insert into src_table values (3, 'A', date '2015-05-31', date '2015-12-31');
insert into src_table values (3, 'B', date '2016-01-01', date '2016-03-31');
insert into src_table values (3, 'A', date '2016-04-01', date '2016-06-30');
insert into src_table values (3, 'B', date '2016-07-01', date '3000-01-01');
insert into src_table values (4, 'A', date '2015-05-31', date '2015-12-31');
insert into src_table values (4, 'A', date '2016-01-01', date '2016-06-30');
insert into src_table values (4, 'B', date '2016-07-01', date '3000-01-01');

Here's the output:

EMP_ID LVL DATE_FROM DATE_TO
1 A 31/07/15 01/01/00
2 A 31/07/15 01/01/00
3 A 31/05/15 31/12/15
3 A 01/04/16 30/06/16
3 B 01/01/16 31/03/16
3 B 01/07/16 01/01/00
4 A 31/05/15 30/06/16
4 B 01/07/16 01/01/00

Upvotes: 0

Ponder Stibbons
Ponder Stibbons

Reputation: 14858

Please try this query:

select emp_id, lvl, min(date_from) df, max(date_to) dt
  from (
    select s2.*, rn - sum(marker) over (order by rn) as grp
      from (
        select s1.*,
               row_number() over (order by emp_id, date_from) rn,
               case when lag(lvl) over (partition by emp_id order by date_from) 
                         = lvl
                     and lag(date_to) over (partition by emp_id order by date_from) + 1 
                         = date_from
                    then 1
                    else 0
               end marker
          from src_table s1 ) s2 )
  group by emp_id, lvl, grp
  order by emp_id, min(date_from)

In first subquery S1 I added marker, where 1 is assigned if previous level is corresponding and dates are consecutive. In second subquery this marker is used to build GRP column which has the same values for all matching rows. This column is used in final grouping query to find minimum date_from and maximum date_to. Please run inner queries separately to see what happens in each step. Tested if there are more than two consecutive rows.

Test data and output:

create table src_table (emp_id number(6), lvl varchar2(2), date_from date, date_to date);
insert into src_table values (1, 'A', date '2015-07-31', date '2016-03-31');
insert into src_table values (1, 'A', date '2016-04-01', date '3000-01-01');
insert into src_table values (2, 'A', date '2015-07-31', date '3000-01-01');
insert into src_table values (3, 'A', date '2015-05-31', date '2015-12-31');
insert into src_table values (3, 'B', date '2016-01-01', date '2016-03-31');
insert into src_table values (3, 'A', date '2016-04-01', date '2016-06-30');
insert into src_table values (3, 'B', date '2016-07-01', date '3000-01-01');
insert into src_table values (4, 'A', date '2015-05-31', date '2015-12-31');
insert into src_table values (4, 'A', date '2016-01-01', date '2016-06-30');
insert into src_table values (4, 'B', date '2016-07-01', date '3000-01-01');

 EMP_ID LVL DF          DT
------- --- ----------- -----------
      1 A   2015-07-31  3000-01-01
      2 A   2015-07-31  3000-01-01
      3 A   2015-05-31  2015-12-31
      3 B   2016-01-01  2016-03-31
      3 A   2016-04-01  2016-06-30
      3 B   2016-07-01  3000-01-01
      4 A   2015-05-31  2016-06-30
      4 B   2016-07-01  3000-01-01

8 rows selected

Upvotes: 3

Related Questions