Jess8766
Jess8766

Reputation: 417

SQL - Incremental insert using SSIS?

I have quite a large table that I need to import into my BI environment for reporting. I have an SSIS package that calls a stored procedure runs every 20 minutes to extract data from source and populate it into my table. The earliest date on the source table is 01-January-2012.

What I would like is for the first time the package runs it will import all the data from source for the month of January 2012. The next time it runs it will populate all the data for February 2012 and so on.

The below is the query I would use to extract the data - this is based on Created and Modified Dates

Select ID, Name, Company, Job, HRID, PayID, CreatedOn, ModifiedOn
from dbo.HRDetails
where CreatedOn between @MonthStart and @MonthEnd
or ModifiedOn between @MonthStart and @MonthEnd

I just need help on how I would make this incremental to pick up the data month on month dynamically?

Any help would be appreciated

-Jess

Upvotes: 2

Views: 155

Answers (3)

Panagiotis Kanavos
Panagiotis Kanavos

Reputation: 131180

Instead of using dates, you can use Change Tracking to identify all keys modified since the last update, even deleted ones. The feature is available in all versions and editions since 2005, even Express.

Work with Change Tracking (SQL Server) shows how you can retrieve any changes made to a table since the last sync operation.

This query will return all modified rows and the reason they were modified from the Product table since the version specified in last_synchronization_version. Any deleted rows will appear with D in the SYS_CHANGE_OPERATION field :

SELECT  
    CT.ProductID, P.Name, P.ListPrice,  
    CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS,  
    CT.SYS_CHANGE_CONTEXT  
FROM  
    SalesLT.Product AS P  
RIGHT OUTER JOIN  
    CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT  
ON  
    P.ProductID = CT.ProductID

The sync version you'll use for the next iteration should be retrieved before selecting changes. You can retrieve it with :

SET @synchronization_version = CHANGE_TRACKING_CURRENT_VERSION(); 

This query is very fast because it joins on the table's primary keys.

Another nice thing about this is that it doesn't care if you forget to run it - it will still pull all changes since the last execution. Running this more frequently results in better performance since the query has fewer changes to return.

Upvotes: 0

Aaron Dietz
Aaron Dietz

Reputation: 10277

In your stored procedure, pull the current max date from the loaded table and set your variables based on that:

DECLARE @DateLoaded = ISNULL((SELECT MAX(dateField) FROM yourLoadedTable),'20120101') --MAX date loaded
DECLARE @MonthStart = DATEADD(DAY,1,EOMONTH(@DateLoaded)) --End of max loaded month, plus 1 day to get first day of next month
DECLARE @MonthEnd = EOMONTH(@DateLoaded, 1) --End of next month

Select ID, Name, Company, Job, HRID, PayID, CreatedOn, ModifiedOn
from dbo.HRDetails
where CreatedOn between @MonthStart and @MonthEnd
or ModifiedOn between @MonthStart and @MonthEnd

I like this type of approach because it's self-repairing if a pull fails, even if you've missed a few months before noticing the issue.

Upvotes: 2

Chris Mack
Chris Mack

Reputation: 5208

I would do this by creating a metadata table in SQL Server.

Each time the package runs, insert a date/identifier into that table to note that the task has been completed for that month (as your final package step). For the first package step, you would use that table to get the next month that hasn't been completed (you would store this in a variable for use in the later insert). (You would also have a default/start month, which would be used if the table is empty.)

Upvotes: 0

Related Questions