Reputation: 683
I have read at length a few times now setting up the CDC features in SQL Server 2012. There is however a very confusing graphic
that seems to demonstrate that your CDC data can be stored in another database (data warehouse). This however is not what I would consider truly 'storing' it.
There is no ETL functionality when you follow the instruction on setting up the 'data collection' unit. My understanding of the data collection is for monitoring over time your database activity, not to go back and review changes to a particular rowset.
So I must be missing some fundamental piece of logic. Is there some sort of magic CDC feature to either bulk insert or move the data from the cdc changetable (that only keeps 3 days by default) itself or do I need to write a custom SSIS/ETL job for every table I setup for CDC? I am ok writing the SSIS job but it seems like this is most likely not the correct approach. If anyone can point me to a tutorial dealing with storing cdc data on another database or instance I would appreciate it. My searches just keep pointing me to data collection or cdc, and neither of these are actually storing 100% of change history.
Thanks for the request to edit. I was thinking it was confusing:
My requirements are the following:
Capture all changes on a specific set of table (CDC to the rescue and it works)
Store the change data in a different relational database for 7 years. ie. Be able to go back to a particular record and get all changes made to it over any period of time. (This is the part that confuses me about the MSDN graphic and apparent solution).
Data ETL job should if possible be done in realtime (not in batch or on schedule)
Upvotes: 1
Views: 890
Reputation: 32737
As a technology, CDC is mostly used to facilitate keeping a secondary data store (e.g. a data warehouse) up to date with changes that happened at the primary. That said, there's nothing saying that you can't do what you're looking to do with CDC. Essentially, you'll set up CDC on the tables that you're interested in tracking the changes for and call cdc.fn_cdc_get_all_changes...
or cdc.fn_cdc_get_net_changes...
(whichever is appropriate for your situation, though I'd guess all changes by your description) periodically to get the changes for those tables, at which point you'll store the raw cdc data off into another table that you've designated for holding it.
Note, however, that the retention requirement is rough. For instance, you can't just delete records after the retention period is over because as long as the record is still active, you need an unbroken chain of changes back to the original insert of the record.
Upvotes: 1