abs786123
abs786123

Reputation: 609

Delta Load - ETL process to identify updates

I have a table with over 30 million rows and it takes too long to do a truncate and reload. I can do the Incremental Load using SSIS not matched output, however I am struggling to get the Delta loads where records have been updated. The Table has no Date fields so no way of checking when data was last modified.

Just wondering if anyone had come across a similar situation and how did you resolve it. I don't want load the full table as it can take 3 hours on just this one table alone!

Upvotes: 0

Views: 3512

Answers (2)

Arun Goudar
Arun Goudar

Reputation: 381

You can use the Change Tracking technology supported by data stores such as Azure SQL Database and SQL Server can be used to identify the delta data. Find detailed implementation steps here, SYS_CHANGE_VERSION

Upvotes: 0

Hank
Hank

Reputation: 31

On the matched output, do another lookup that checks to see if the incoming row is different that the row with the matching primary key. If it is different, do an update, if it is the same, do nothing.

I know of two ways you can check to see if the rows are the same.

1) Add a derived column to the new incoming row that creates a hash value of every column. Then create the same hash column in your local table..compare the hashes in the lookup.

2) In the lookup, join every field...i.e. A-->A, B-->B, C-->C, etc. Then redirect the non-matches to do an update. This may not be possible if you have blob columns.

Number 1 is probably more efficient but harder to set up.

Upvotes: 0

Related Questions