ito
ito

Reputation: 157

How to execute join between three slow change dimensions sort by all start date columns?

I'm trying to join data between three slow change dimension type 2. When I query the result, the sort by date between the dimensions are not as expected.

I have the slow change dimensions below:

Table Subsidiaries

id name subsidiary department start_date_dep end_date_dep last_record_flg
1 John Doe AL Engineering 2005-10-01 2013-01-01 0
1 John Doe AL Sales 2013-01-01 2014-05-01 0
1 John Doe NY Sales 2014-05-01 1
38 Ivy Johnson NY Sales 2020-06-01 1

Table Functions

id function start_date_fun end_date_fun last_record_flg
1 operator 2005-10-01 2009-08-01 0
1 leader 2009-08-01 2011-10-01 0
1 manager 2011-10-01 2017-07-01 0
1 director 2017-07-01 1
38 operator 2020-06-01 1

Table Graduations

id university_graduation conclusion_date last_record_flg
1 bachelor 15/12/2005 0
1 master 15/12/2008 1
38 bachelor 15/12/2014 1

The desired result is:

id name subsidiary department start_date_dep end_date_dep last_record_flg function start_date_fun end_date_fun last_record_flg university_graduation conclusion_date last_record_flg max_date seq start end last_record_flg
1 John Doe AL Engineering 2005-10-01 2013-01-01 0 operator 2005-10-01 2009-08-01 0 bachelor 2005-12-15 0 2005-12-15 1 2005-10-01 2008-12-15 0
1 John Doe AL Engineering 2005-10-01 2013-01-01 0 operator 2005-10-01 2009-08-01 0 master 2008-12-15 1 2008-12-15 1 2008-12-15 2009-08-01 0
1 John Doe AL Engineering 2005-10-01 2013-01-01 0 leader 2009-08-01 2011-10-01 0 master 2008-12-15 1 2009-08-01 1 2009-08-01 2011-10-01 0
1 John Doe AL Engineering 2005-10-01 2013-01-01 0 manager 2011-10-01 2017-07-01 0 master 2008-12-15 1 2011-10-01 1 2011-10-01 2013-01-01 0
1 John Doe AL Sales 2013-01-01 2014-05-01 0 manager 2011-10-01 2017-07-01 0 master 2008-12-15 1 2013-01-01 1 2013-01-01 2014-05-01 0
1 John Doe NY Sales 2014-05-01 NULL 1 manager 2011-10-01 2017-07-01 0 master 2008-12-15 1 2014-05-01 1 2014-05-01 2017-07-01 0
1 John Doe NY Sales 2014-05-01 NULL 1 director 2017-07-01 NULL 1 master 2008-12-15 1 2017-07-01 1 2017-07-01 NULL 1
38 Ivy Johnson NY Sales 2020-06-01 NULL 1 operator 2020-06-01 NULL 1 bachelor 2014-12-15 1 2020-06-01 1 2020-06-01 NULL 1

I tried with CROSS APPLY, but is returning only one line for each id. I'm trying with CASE WHEN but the query output is not exactly equal the desired result. In my return the column 'FUNCTION' and 'START_DATE_FUN' not follow the sequence (sort) presented in the desired result, the same occur for columns 'UNIVERSITY_GRADUATION' and 'CONCLUSION_DATE'.

The query:

select 
    *
from(
    select 
        tb.*
        ,row_number() over(partition by tb.id,tb.max_date order by tb.max_date) as seq
        ,tb.max_date as [start]
        ,lead( tb.max_date ) over( partition by tb.id order by tb.max_date ) as [end] 
        ,case when lead( tb.max_date ) over( partition by tb.id order by tb.max_date ) is null then 1 else 0 end as last_record_flg
    from(
        select 
            sb.id
            ,sb.[name]
            ,sb.subsidiary
            ,sb.department
            ,sb.start_date_dep
            ,sb.end_date_dep
            ,sb.last_record_flg as lr_sb
            ,fc.[function]
            ,fc.start_date_fun
            ,fc.end_date_fun
            ,fc.last_record_flg as lr_fc
            ,gd.university_graduation
            ,gd.end_date_grad
            ,gd.last_record_flg as lr_gd
            ,case
                when sb.start_date_dep >= fc.start_date_fun and sb.start_date_dep >= gd.end_date_grad then sb.start_date_dep
                when fc.start_date_fun >= sb.start_date_dep and fc.start_date_fun >= gd.end_date_grad then fc.start_date_fun
                else gd.end_date_grad
            end as max_date
        from 
            #Subsidiaries as sb
            left outer join #Functions as fc
                on sb.id = fc.id
            left outer join #Graduations as gd
                on sb.id = gd.id
    ) as tb
) as tb2
where
    tb2.seq = 1

Below the DDL:

create table #Subsidiaries (
    id int
    ,[name] varchar(15)
    ,subsidiary varchar(2)
    ,department varchar(15)
    ,start_date_dep date
    ,end_date_dep date
    ,last_record_flg bit
)
go

insert into #Subsidiaries values
(1,'John Doe','AL','Engineering','2005-10-01','2013-01-01',0),
(1,'John Doe','AL','Sales','2013-01-01','2014-05-01',0),
(1,'John Doe','NY','Sales','2014-05-01',null,1),
(38,'Ivy Johnson','NY','Sales','2020-06-01',null,1)
go

create table #Functions (
    id int
    ,[function] varchar(15)
    ,start_date_fun date
    ,end_date_fun date
    ,last_record_flg bit
)
go

insert into #Functions values
(1,'operator','2005-10-01','2009-08-01',0),
(1,'leader','2009-08-01','2011-10-01',0),
(1,'manager','2011-10-01','2017-07-01',0),
(1,'director','2017-07-01',null,1),
(38,'operator','2020-06-01',null,1)
go

create table #Graduations (
    id int
    ,university_graduation varchar(15)
    ,end_date_grad date
    ,last_record_flg bit
)
go

insert into #Graduations values
(1,'bachelor','2005-12-15',0),
(1,'master','2008-12-15',1),
(38,'bachelor','2014-12-15',1)
go

Upvotes: 1

Views: 803

Answers (1)

ito
ito

Reputation: 157

Case when someone find the same difficult to join two or more SCD type 2, I could find a reference in this link https://sqlsunday.com/2014/11/30/joining-two-scd2-tables/ (SQL Sunday) that help me to build the query and use the range intervals in the join condition to return result as desired.

Upvotes: 1

Related Questions