Samuel Vanga
Samuel Vanga

Reputation: 501

Different kind of incremental load

Please look at the following staging table. It has multiple rows for the same policy. Data to this table is loaded from a flat file I receive from external sources.

Column values can change between one row to the next row. See ColA. There could be limited columns populated in the first row. More columns will be populated in the next rows. See columns ColB and ColC, they are null initially and are populated in second and third rows.

`CREATE TABLE #Stg
(
  PolicyNum VARCHAR(10) ,
  ColA VARCHAR(10) ,
  ColB VARCHAR(10) ,
  ColC VARCHAR(10) ,
  TimeStampKey VARCHAR(100)
)

INSERT  #Stg
    ( PolicyNum, ColA, ColB, ColC, TimeStampKey )
VALUES  ( 'MDT1000', 'SomeVal_A1', NULL, NULL, '2013041113033140MDT1000ZA' )
    ,
    ( 'MDT1000', 'SomeVal_A2', 'SomeVal_B', NULL, '2013041113051756MDT1000ZA' )
    ,
    ( 'MDT1000', 'SomeVal_A3', 'SomeVal_B', 'SomeVal_C', '2013041113115418MDT1000ZA' )`

From this staging table I need to load data to a destination table while maintaing history. Destination table is basically a type 2 slowly chaning dimension. In other words, I've load the first row from staging because it doesn't exist and update it with the second row and update again with the third row.

Folliwing is an example of destination schema:

CREATE TABLE #Dst
(
PolicyKey INT IDENTITY(1,1) PRIMARY KEY
, PolicyNum VARCHAR(10)
, ColA VARCHAR(10) 
, ColB VARCHAR(10)
, ColC VARCHAR(10)
, IsActive BIT
, RowStartDate DATETIME
, RowEndDate DATETIME
)

Normally I'd write a merge statement or an SSIS package to handle incremental loads and scd dimensions, but since original record and change records are in the same file the standard approach doesn't work.

I'd appreciate if you can throw some light on how to approach this. I'm trying to avoid row by row operations.

Thanks, Sam.

Upvotes: 0

Views: 919

Answers (1)

Anoop Verma
Anoop Verma

Reputation: 1505

try this:

SELECT 
  Stg.*
FROM Stg 
INNER JOIN 
(
    SELECT PolicyNum, MAX (TimeStampKey) AS MAX_TimeStampKey 
    FROM Stg
    GROUP BY PolicyNum 
 ) T 
 ON T.PolicyNum = Stg.PolicyNum
 AND T.MAX_TimeStampKey = Stg.TimeStampKey

The result:

 PolicyNum  ColA       ColB       ColC       TimeStampKey
 ---------- ---------- ---------- ---------- -------------------
 MDT1000    SomeVal_A3 SomeVal_B  SomeVal_C  2013041113115418MDT1000ZA

Upvotes: 0

Related Questions