System Versioning with ETL

Question

I have just started to investigate System Versioning (temporal tables) as it is pretty cool feature with SQL. I was able to successfully set it up on one of my existing tables with the below query. However whenever my daily ETL runs, it is adding data to the History table for all items in Table1, whether there are changes or not. I am using an Insert into, Update, and Delete SQL Task in SSIS for my ETL. The ETL is updating all existing rows, usually with the same data, but I was hopeful System Versioning would only add a new row if the existing row actually had data change.

Is this a limitation of using an Update statement in a SQL Task in my ETL? Would using Slowly Changing Dimension in Data Flow make a difference or is there a better way to make this work?

Or is this a limitation of System Versioning with ETLs and I should use something else for tracking table changes?

CREATE SCHEMA History   
GO    
ALTER TABLE Table1   
   ADD   
      SysStartTime datetime2(0) GENERATED ALWAYS AS ROW START HIDDEN    
           CONSTRAINT DF_SysStart DEFAULT SYSUTCDATETIME()  
      , SysEndTime datetime2(0) GENERATED ALWAYS AS ROW END HIDDEN    
           CONSTRAINT DF_SysEnd DEFAULT CONVERT(datetime2 (0), '9999-12-31 23:59:59'),   
      PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime);   
GO   
ALTER TABLE Table1    
   SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = History.Table1))

billinkc · Accepted Answer

Comments to answer

If you issue an update statement that is the equivalent of setting every field to itself, SQL Server doesn't have any shortcut logic built into it that flags it as a noop and does nothing. It sounds like your current ETL pattern is updating every record per run and thus, your historical tables are growing

That makes sense. Would redesigning my ETL to only update changed rows would be the best way to address my historical table growing? Does a Slowly Changing Dimension Table accomplish this or is there a better way to make this happen? – Hslew 2 days ago

It's been ages since I used the Slowly Changing Dimension Wizard in SSIS. It was a bit crap back then and I can't see them having improved it at all. I find i have the most success with a source query to deduplicate within my data, a lookup to determine does the inbound row match an existing row. If it doesn't match, I load that in my table as it's new data. If it matches, then I need to do a second check to see if it's changed. Only if it changes, do I send it to a second table. SSIS doesn't scale updates well. After the data flow completes, would I update the destination (Execute SQL Task)

System Versioning with ETL

Answers (1)

Related Questions