Reputation: 1257
I am facing a problem. I can't figure out how to merge consecutive date range rows together, based on two dimensions. One is OK for me, but second makes troubles
Let's imagine table in this structure with four possible scenarios
emp_id | level | date_from | date_to
--------------------------------------------------
1 | A | 7/31/2015 | 3/31/2016
1 | A | 4/1/2016 | 1/1/3000
2 | A | 7/31/2015 | 1/1/3000
3 | A | 5/31/2015 | 12/31/2015
3 | B | 1/1/2016 | 3/31/2016
3 | A | 4/1/2016 | 6/30/2016
3 | B | 7/1/2016 | 1/1/3000
4 | A | 5/31/2015 | 12/31/2015
4 | A | 1/1/2016 | 6/30/2016
4 | B | 7/1/2016 | 1/1/3000
I want to merge only those rows, that have consecutive date ranges and act_level = prev_level
I tried to do something like this
SELECT emp_id
, level
, date_from
, date_to
--
, CASE
WHEN lag(level) over (partition by emp_id order by date_from) = level THEN
CASE
WHEN lag(date_to) over (partition by emp_id, level order by date_from) = date_from-1
THEN lag(date_from) over (partition by code_employee, level_name order by date_from)
ELSE NULL
END
ELSE
CASE
WHEN lag(level) over (partition by emp_id order by date_from) = level
OR
lead(level) over (partition by emp_id order by date_from) = level
THEN NULL
ELSE date_from
END
END date_from_new
, date_to as date_to_new
--
FROM src_table
--
WHERE 1=1
this gives me nearly the results that I want:
emp_id | level | date_from | date_to | d_from_new | d_from_to
--------------------------------------------------------------------------
1 | A | 7/31/2015 | 3/31/2016 | | 3/31/2016
1 | A | 4/1/2016 | 1/1/3000 | 7/31/2015 | 1/1/3000
2 | A | 7/31/2015 | 1/1/3000 | 7/31/2015 | 1/1/3000
3 | A | 5/31/2015 | 12/31/2015 | 5/31/2015 | 12/31/2015
3 | B | 1/1/2016 | 3/31/2016 | 1/1/2016 | 3/31/2016
3 | A | 4/1/2016 | 6/30/2016 | 4/1/2016 | 6/30/2016
3 | B | 7/1/2016 | 1/1/3000 | 7/1/2016 | 1/1/3000
4 | A | 5/31/2015 | 12/31/2015 | | 12/31/2015
4 | A | 1/1/2016 | 6/30/2016 | 5/31/2015 | 6/30/2016
4 | B | 7/1/2016 | 1/1/3000 | 7/1/2016 | 1/1/3000
I will just filter the result for d_from_new (date_from_new) not null values. But I am not sure what's gonna happen if there will be for example 3x the same level with consecutive date range, or 8times.
And honestly - I don't like the query :)
Do you have any "perfomence-friendly" and "eye-friendly" solution?
Upvotes: 5
Views: 2703
Reputation: 11
The following SQL statement answers what seems to be the intended question: Identify consecutive intervals (date_from, date_to)
and levels lvl
per emp_id
and merge them into a single row (emp_id, lvl, date_from, date_to).
The trick is to assign distinct group ids to groups of successive (date_from, date_to)
and lvl
values. The whole story is done is 2 inline views, tab0
and tab1
:
diff_levels
and diff_dates
in tab0
are non-zero whenever we cross group boundariesranked_levels
and ranked_dates
in tab1
generate cumulatives of the diff fields (thereby making the lvl and (date_from, date_to) groups distinct)with
-- tab0:
tab0 as (
select
emp_id,
lvl,
date_from,
date_to,
case
when
lvl != lag(lvl, 1, lvl)
over(partition by emp_id order by date_from)
then 1
else 0
end diff_levels,
date_from - lag(date_to, 1, date_from)
over(partition by emp_id order by date_from) - 1 diff_dates
from src_table),
-- tab1:
tab1 as (
select
emp_id,
lvl,
date_from,
date_to,
sum(diff_levels) over(
partition by emp_id
order by date_from
range between unbounded preceding and current row) ranked_levels,
sum(diff_dates) over(
partition by emp_id
order by date_from
range between unbounded preceding and current row) ranked_dates
from tab0)
select
emp_id,
lvl,
min(date_from) date_from,
max(date_to) date_to
from tab1
group by
emp_id,
lvl,
ranked_levels,
ranked_dates
order by emp_id, lvl;
The SQL was tested on ORACLE 11g but since it is ANSI SQL it runs everywhere.
I used the tiny table given in a previous answer:
create table src_table (emp_id number(6), lvl varchar2(2), date_from date, date_to date);
insert into src_table values (1, 'A', date '2015-07-31', date '2016-03-31');
insert into src_table values (1, 'A', date '2016-04-01', date '3000-01-01');
insert into src_table values (2, 'A', date '2015-07-31', date '3000-01-01');
insert into src_table values (3, 'A', date '2015-05-31', date '2015-12-31');
insert into src_table values (3, 'B', date '2016-01-01', date '2016-03-31');
insert into src_table values (3, 'A', date '2016-04-01', date '2016-06-30');
insert into src_table values (3, 'B', date '2016-07-01', date '3000-01-01');
insert into src_table values (4, 'A', date '2015-05-31', date '2015-12-31');
insert into src_table values (4, 'A', date '2016-01-01', date '2016-06-30');
insert into src_table values (4, 'B', date '2016-07-01', date '3000-01-01');
Here's the output:
EMP_ID | LVL | DATE_FROM | DATE_TO |
---|---|---|---|
1 | A | 31/07/15 | 01/01/00 |
2 | A | 31/07/15 | 01/01/00 |
3 | A | 31/05/15 | 31/12/15 |
3 | A | 01/04/16 | 30/06/16 |
3 | B | 01/01/16 | 31/03/16 |
3 | B | 01/07/16 | 01/01/00 |
4 | A | 31/05/15 | 30/06/16 |
4 | B | 01/07/16 | 01/01/00 |
Upvotes: 0
Reputation: 14858
Please try this query:
select emp_id, lvl, min(date_from) df, max(date_to) dt
from (
select s2.*, rn - sum(marker) over (order by rn) as grp
from (
select s1.*,
row_number() over (order by emp_id, date_from) rn,
case when lag(lvl) over (partition by emp_id order by date_from)
= lvl
and lag(date_to) over (partition by emp_id order by date_from) + 1
= date_from
then 1
else 0
end marker
from src_table s1 ) s2 )
group by emp_id, lvl, grp
order by emp_id, min(date_from)
In first subquery S1
I added marker, where 1 is assigned if previous level is corresponding and dates are consecutive. In second subquery this marker is used to build GRP
column which has the same values for all matching rows. This column is used in final grouping query to find minimum date_from
and maximum date_to
. Please run inner queries separately to see what happens in each step. Tested if there are more than two consecutive rows.
Test data and output:
create table src_table (emp_id number(6), lvl varchar2(2), date_from date, date_to date);
insert into src_table values (1, 'A', date '2015-07-31', date '2016-03-31');
insert into src_table values (1, 'A', date '2016-04-01', date '3000-01-01');
insert into src_table values (2, 'A', date '2015-07-31', date '3000-01-01');
insert into src_table values (3, 'A', date '2015-05-31', date '2015-12-31');
insert into src_table values (3, 'B', date '2016-01-01', date '2016-03-31');
insert into src_table values (3, 'A', date '2016-04-01', date '2016-06-30');
insert into src_table values (3, 'B', date '2016-07-01', date '3000-01-01');
insert into src_table values (4, 'A', date '2015-05-31', date '2015-12-31');
insert into src_table values (4, 'A', date '2016-01-01', date '2016-06-30');
insert into src_table values (4, 'B', date '2016-07-01', date '3000-01-01');
EMP_ID LVL DF DT
------- --- ----------- -----------
1 A 2015-07-31 3000-01-01
2 A 2015-07-31 3000-01-01
3 A 2015-05-31 2015-12-31
3 B 2016-01-01 2016-03-31
3 A 2016-04-01 2016-06-30
3 B 2016-07-01 3000-01-01
4 A 2015-05-31 2016-06-30
4 B 2016-07-01 3000-01-01
8 rows selected
Upvotes: 3