Reputation: 157
I'm trying to join data between three slow change dimension type 2. When I query the result, the sort by date between the dimensions are not as expected.
I have the slow change dimensions below:
Table Subsidiaries
id | name | subsidiary | department | start_date_dep | end_date_dep | last_record_flg |
---|---|---|---|---|---|---|
1 | John Doe | AL | Engineering | 2005-10-01 | 2013-01-01 | 0 |
1 | John Doe | AL | Sales | 2013-01-01 | 2014-05-01 | 0 |
1 | John Doe | NY | Sales | 2014-05-01 | 1 | |
38 | Ivy Johnson | NY | Sales | 2020-06-01 | 1 |
Table Functions
id | function | start_date_fun | end_date_fun | last_record_flg |
---|---|---|---|---|
1 | operator | 2005-10-01 | 2009-08-01 | 0 |
1 | leader | 2009-08-01 | 2011-10-01 | 0 |
1 | manager | 2011-10-01 | 2017-07-01 | 0 |
1 | director | 2017-07-01 | 1 | |
38 | operator | 2020-06-01 | 1 |
Table Graduations
id | university_graduation | conclusion_date | last_record_flg |
---|---|---|---|
1 | bachelor | 15/12/2005 | 0 |
1 | master | 15/12/2008 | 1 |
38 | bachelor | 15/12/2014 | 1 |
The desired result is:
id | name | subsidiary | department | start_date_dep | end_date_dep | last_record_flg | function | start_date_fun | end_date_fun | last_record_flg | university_graduation | conclusion_date | last_record_flg | max_date | seq | start | end | last_record_flg |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | John Doe | AL | Engineering | 2005-10-01 | 2013-01-01 | 0 | operator | 2005-10-01 | 2009-08-01 | 0 | bachelor | 2005-12-15 | 0 | 2005-12-15 | 1 | 2005-10-01 | 2008-12-15 | 0 |
1 | John Doe | AL | Engineering | 2005-10-01 | 2013-01-01 | 0 | operator | 2005-10-01 | 2009-08-01 | 0 | master | 2008-12-15 | 1 | 2008-12-15 | 1 | 2008-12-15 | 2009-08-01 | 0 |
1 | John Doe | AL | Engineering | 2005-10-01 | 2013-01-01 | 0 | leader | 2009-08-01 | 2011-10-01 | 0 | master | 2008-12-15 | 1 | 2009-08-01 | 1 | 2009-08-01 | 2011-10-01 | 0 |
1 | John Doe | AL | Engineering | 2005-10-01 | 2013-01-01 | 0 | manager | 2011-10-01 | 2017-07-01 | 0 | master | 2008-12-15 | 1 | 2011-10-01 | 1 | 2011-10-01 | 2013-01-01 | 0 |
1 | John Doe | AL | Sales | 2013-01-01 | 2014-05-01 | 0 | manager | 2011-10-01 | 2017-07-01 | 0 | master | 2008-12-15 | 1 | 2013-01-01 | 1 | 2013-01-01 | 2014-05-01 | 0 |
1 | John Doe | NY | Sales | 2014-05-01 | NULL | 1 | manager | 2011-10-01 | 2017-07-01 | 0 | master | 2008-12-15 | 1 | 2014-05-01 | 1 | 2014-05-01 | 2017-07-01 | 0 |
1 | John Doe | NY | Sales | 2014-05-01 | NULL | 1 | director | 2017-07-01 | NULL | 1 | master | 2008-12-15 | 1 | 2017-07-01 | 1 | 2017-07-01 | NULL | 1 |
38 | Ivy Johnson | NY | Sales | 2020-06-01 | NULL | 1 | operator | 2020-06-01 | NULL | 1 | bachelor | 2014-12-15 | 1 | 2020-06-01 | 1 | 2020-06-01 | NULL | 1 |
I tried with CROSS APPLY, but is returning only one line for each id. I'm trying with CASE WHEN but the query output is not exactly equal the desired result. In my return the column 'FUNCTION' and 'START_DATE_FUN' not follow the sequence (sort) presented in the desired result, the same occur for columns 'UNIVERSITY_GRADUATION' and 'CONCLUSION_DATE'.
The query:
select
*
from(
select
tb.*
,row_number() over(partition by tb.id,tb.max_date order by tb.max_date) as seq
,tb.max_date as [start]
,lead( tb.max_date ) over( partition by tb.id order by tb.max_date ) as [end]
,case when lead( tb.max_date ) over( partition by tb.id order by tb.max_date ) is null then 1 else 0 end as last_record_flg
from(
select
sb.id
,sb.[name]
,sb.subsidiary
,sb.department
,sb.start_date_dep
,sb.end_date_dep
,sb.last_record_flg as lr_sb
,fc.[function]
,fc.start_date_fun
,fc.end_date_fun
,fc.last_record_flg as lr_fc
,gd.university_graduation
,gd.end_date_grad
,gd.last_record_flg as lr_gd
,case
when sb.start_date_dep >= fc.start_date_fun and sb.start_date_dep >= gd.end_date_grad then sb.start_date_dep
when fc.start_date_fun >= sb.start_date_dep and fc.start_date_fun >= gd.end_date_grad then fc.start_date_fun
else gd.end_date_grad
end as max_date
from
#Subsidiaries as sb
left outer join #Functions as fc
on sb.id = fc.id
left outer join #Graduations as gd
on sb.id = gd.id
) as tb
) as tb2
where
tb2.seq = 1
Below the DDL:
create table #Subsidiaries (
id int
,[name] varchar(15)
,subsidiary varchar(2)
,department varchar(15)
,start_date_dep date
,end_date_dep date
,last_record_flg bit
)
go
insert into #Subsidiaries values
(1,'John Doe','AL','Engineering','2005-10-01','2013-01-01',0),
(1,'John Doe','AL','Sales','2013-01-01','2014-05-01',0),
(1,'John Doe','NY','Sales','2014-05-01',null,1),
(38,'Ivy Johnson','NY','Sales','2020-06-01',null,1)
go
create table #Functions (
id int
,[function] varchar(15)
,start_date_fun date
,end_date_fun date
,last_record_flg bit
)
go
insert into #Functions values
(1,'operator','2005-10-01','2009-08-01',0),
(1,'leader','2009-08-01','2011-10-01',0),
(1,'manager','2011-10-01','2017-07-01',0),
(1,'director','2017-07-01',null,1),
(38,'operator','2020-06-01',null,1)
go
create table #Graduations (
id int
,university_graduation varchar(15)
,end_date_grad date
,last_record_flg bit
)
go
insert into #Graduations values
(1,'bachelor','2005-12-15',0),
(1,'master','2008-12-15',1),
(38,'bachelor','2014-12-15',1)
go
Upvotes: 1
Views: 803
Reputation: 157
Case when someone find the same difficult to join two or more SCD type 2, I could find a reference in this link https://sqlsunday.com/2014/11/30/joining-two-scd2-tables/ (SQL Sunday) that help me to build the query and use the range intervals in the join condition to return result as desired.
Upvotes: 1