Reputation: 38
I have a ssis package that runs every hour to bring in sales. currently I am bringing in all rows from the beginning of the business, but would like to only bring in rows that are new and those that have been changed since last run. what I need to do is somehow get ssis to create a last run column and then be able to use that time to filter all incoming rows that are either new or have been changed. not sure how go about doing this, any suggestions are welcome.
Upvotes: 0
Views: 1087
Reputation: 131492
You can use Change Tracking to retrieve changes made to a table since a specific version number. This can catch deletions which you can't handle with auditing columns.
Change Tracking is available in all SQL Server editions (ie even Express) so you don't have to worry about licensing.
Copying from the docs, you can load all changes since a specific version with :
SELECT
CT.ProductID, P.Name, P.ListPrice,
CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS,
CT.SYS_CHANGE_CONTEXT
FROM
SalesLT.Product AS P
RIGHT OUTER JOIN
CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT
ON
P.ProductID = CT.ProductID
@last_synchronization_version
is the database version used the last time the ETL job run.
There are various ways to get the last version.
One way would be to store the current version number with CHANGE_TRACKING_CURRENT_VERSION() when the ETL job starts into a table and use it as the @last_synchronization_version
for the next run. The first time @last_synchronization_version
would be 0
.
You could use an OLEDB Command to load the last stored version from eg a JobHistory table and the current version at the same time :
select max(NextVersion) as VersionSince, CHANGE_TRACKING_CURRENT_VERSION() as NextVersion
from JobHistory;
And store the next version with at the end of the package :
insert into JobHistory (...,NextVersion)
valures (...,@NextVersion)
This way you can easily retrieve all changes made since the last run.
Another option is to include each row's version number in the results by adding the SYS_CHANGE_VERSION
column to the query
SELECT
CT.ProductID, P.Name, P.ListPrice,
CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS,
CT.SYS_CHANGE_CONTEXT ,
CT.SYS_CHANGE_VERSION as TrackingVersion
FROM
SalesLT.Product AS P
RIGHT OUTER JOIN
CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT
ON
P.ProductID = CT.ProductID
On each run you can retrieve the maximum version from the target table and use it to load changes from the source.
SELECT ISNULL(MAX(TrackingVersion),0) as LastVersion
From TargetTable
Another option would be to store the overall database version per row, instead of each row's version.
Upvotes: 1
Reputation: 1270021
You look at the columns createdAt
and updatedAt
, which you meticulously maintain for all the rows in your table, probably using triggers (although you might do this through application logic as well).
Let me assume that updatedAt
is set when a row is created. Then you can do:
select t.*
from t
where t.updatedAt >= @LastReadDate;
Upvotes: 2