Cade
Cade

Reputation: 38

How do I only bring in rows that have changed since last ssis run

I have a ssis package that runs every hour to bring in sales. currently I am bringing in all rows from the beginning of the business, but would like to only bring in rows that are new and those that have been changed since last run. what I need to do is somehow get ssis to create a last run column and then be able to use that time to filter all incoming rows that are either new or have been changed. not sure how go about doing this, any suggestions are welcome.

Upvotes: 0

Views: 1087

Answers (2)

Panagiotis Kanavos
Panagiotis Kanavos

Reputation: 131492

You can use Change Tracking to retrieve changes made to a table since a specific version number. This can catch deletions which you can't handle with auditing columns.

Change Tracking is available in all SQL Server editions (ie even Express) so you don't have to worry about licensing.

Copying from the docs, you can load all changes since a specific version with :

SELECT  
    CT.ProductID, P.Name, P.ListPrice,  
    CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS,  
    CT.SYS_CHANGE_CONTEXT  
FROM  
    SalesLT.Product AS P  
RIGHT OUTER JOIN  
    CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT  
ON  
    P.ProductID = CT.ProductID

@last_synchronization_version is the database version used the last time the ETL job run.

There are various ways to get the last version.

One way would be to store the current version number with CHANGE_TRACKING_CURRENT_VERSION() when the ETL job starts into a table and use it as the @last_synchronization_version for the next run. The first time @last_synchronization_version would be 0.

You could use an OLEDB Command to load the last stored version from eg a JobHistory table and the current version at the same time :

select max(NextVersion) as VersionSince, CHANGE_TRACKING_CURRENT_VERSION() as NextVersion
from JobHistory;

And store the next version with at the end of the package :

insert into JobHistory (...,NextVersion)
valures (...,@NextVersion)

This way you can easily retrieve all changes made since the last run.

Another option is to include each row's version number in the results by adding the SYS_CHANGE_VERSION column to the query

SELECT  
    CT.ProductID, P.Name, P.ListPrice,  
    CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS,  
    CT.SYS_CHANGE_CONTEXT  ,
    CT.SYS_CHANGE_VERSION as TrackingVersion
FROM  
    SalesLT.Product AS P  
RIGHT OUTER JOIN  
    CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT  
ON  
    P.ProductID = CT.ProductID

On each run you can retrieve the maximum version from the target table and use it to load changes from the source.

SELECT ISNULL(MAX(TrackingVersion),0) as LastVersion
From TargetTable 

Another option would be to store the overall database version per row, instead of each row's version.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270021

You look at the columns createdAt and updatedAt, which you meticulously maintain for all the rows in your table, probably using triggers (although you might do this through application logic as well).

Let me assume that updatedAt is set when a row is created. Then you can do:

select t.*
from t
where t.updatedAt >= @LastReadDate;

Upvotes: 2

Related Questions