Reputation: 417
I have quite a large table that I need to import into my BI environment for reporting. I have an SSIS package that calls a stored procedure runs every 20 minutes to extract data from source and populate it into my table. The earliest date on the source table is 01-January-2012.
What I would like is for the first time the package runs it will import all the data from source for the month of January 2012. The next time it runs it will populate all the data for February 2012 and so on.
The below is the query I would use to extract the data - this is based on Created and Modified Dates
Select ID, Name, Company, Job, HRID, PayID, CreatedOn, ModifiedOn
from dbo.HRDetails
where CreatedOn between @MonthStart and @MonthEnd
or ModifiedOn between @MonthStart and @MonthEnd
I just need help on how I would make this incremental to pick up the data month on month dynamically?
Any help would be appreciated
-Jess
Upvotes: 2
Views: 155
Reputation: 131180
Instead of using dates, you can use Change Tracking to identify all keys modified since the last update, even deleted ones. The feature is available in all versions and editions since 2005, even Express.
Work with Change Tracking (SQL Server) shows how you can retrieve any changes made to a table since the last sync operation.
This query will return all modified rows and the reason they were modified from the Product
table since the version specified in last_synchronization_version
. Any deleted rows will appear with D
in the SYS_CHANGE_OPERATION
field :
SELECT
CT.ProductID, P.Name, P.ListPrice,
CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS,
CT.SYS_CHANGE_CONTEXT
FROM
SalesLT.Product AS P
RIGHT OUTER JOIN
CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT
ON
P.ProductID = CT.ProductID
The sync version you'll use for the next iteration should be retrieved before selecting changes. You can retrieve it with :
SET @synchronization_version = CHANGE_TRACKING_CURRENT_VERSION();
This query is very fast because it joins on the table's primary keys.
Another nice thing about this is that it doesn't care if you forget to run it - it will still pull all changes since the last execution. Running this more frequently results in better performance since the query has fewer changes to return.
Upvotes: 0
Reputation: 10277
In your stored procedure, pull the current max date from the loaded table and set your variables based on that:
DECLARE @DateLoaded = ISNULL((SELECT MAX(dateField) FROM yourLoadedTable),'20120101') --MAX date loaded
DECLARE @MonthStart = DATEADD(DAY,1,EOMONTH(@DateLoaded)) --End of max loaded month, plus 1 day to get first day of next month
DECLARE @MonthEnd = EOMONTH(@DateLoaded, 1) --End of next month
Select ID, Name, Company, Job, HRID, PayID, CreatedOn, ModifiedOn
from dbo.HRDetails
where CreatedOn between @MonthStart and @MonthEnd
or ModifiedOn between @MonthStart and @MonthEnd
I like this type of approach because it's self-repairing if a pull fails, even if you've missed a few months before noticing the issue.
Upvotes: 2
Reputation: 5208
I would do this by creating a metadata table in SQL Server.
Each time the package runs, insert a date/identifier into that table to note that the task has been completed for that month (as your final package step). For the first package step, you would use that table to get the next month that hasn't been completed (you would store this in a variable for use in the later insert). (You would also have a default/start month, which would be used if the table is empty.)
Upvotes: 0