Reputation: 135
I have an interesting issue and requirement for a large multi-schema database.
-The database is around 130Gb in Size.
-It is a multi Schema database, each customer has a schema.
-We currently have 102,247 tables in the system.
-Microsoft SQL Server 2k8 r2
This is due to customisation requirements of customers, all using a single defined front end. The issue we have is that our database backups become astronomical and getting a database restore done for retrieval of lost/missing/incorrect data is a nightmare. The initial product did not have defined audit trails and we don't have 'changes' to data stored, we simply have 1 version of data.
getting lost data back basically means restoring a full 130GB backup and loading differentials/transaction files to get the data.
We want to introduce a 'Changeset' for each important table within each schema. essentially holding a set of the data, then any modified/different data as it is saved - every X number of minutes. This will have to be a SQL job initially, but I want to know what would be the best method.
Essentially I would run a script to insert the 'backup' tables into each schema for the tables we wish to keep backed up.
Then run a job every X minutes to cycle through each schema and insert current - then new/changed data as it spots a change. (based on the modifiedDate of the row) It will then retain this changelog for around a month before self-overwriting.
We still have our larger backups, but we wont need to keep a larger retention period. My point is, what is the best and most efficient method of checking for a changed data and performing an insert.
My gut feeling would be :
INSERT INTO BACKUP_table (UNIQUE ID, col1,col2,col3)
select col1,col2,col3 from table where and ModifiedDate < DATEADD(mi,+90,Current_TimeStamp)
*rough SQL
This would have to be in a loop to go through all schemas and run this. A number of tables wont have changed data.
Is this even a good method?
What does SO think?
Upvotes: 0
Views: 1734
Reputation: 280429
My first response would be to consider keeping each customer in their own database instead of their own schema within a massive database. The key benefits to doing this are:
I managed such a system for several years at my previous job and managing 500 databases was no more complex than managing 10, and the only difference to your applications is the database part of the connection string (which is actually easier to make queries adapt to than a schema prefix).
If you're really committed to keeping everyone in a single database, then what you can consider doing is storing your important tables inside of each schema within their own filegroup, and move everything out of the primary filegroup. Now you can backup those filegroups independently and, based on solely the full primary backup and a piecemeal restore of the individual filegroup backup, you can bring just that customer's schema online in another location, and retrieve the data you're after (maybe copying it over to the primary database using import/export, BCP, or simple DML queries), without having to completely restore the entire database. Moving all user data out of the primary filegroup minimizes the time it takes to restore that initial backup and get you on to restoring the specific customer's filegroup. While this makes your backup/recovery strategy a little more complex, it does achieve what you're after I believe.
Another option is to use a custom log shipping implementation with an intentional delay. We did this for a while by shipping our logs to a reporting server, but waiting 12 hours before applying them. This gave us protection from customers shooting themselves in the foot and then requiring a restore - if they contacted us within 12 hours of their mistake, we likely already had the "before-screw-up" data online on the reporting server, making it trivial to fix it on the primary server. It also doubled as a reporting server for reports looking at data older than 12 hours, taking substantial load away from the primary server.
You can also consider change data capture but you will obviously need to test the performance and the impact on the rest of your workload. This solution also will depend on the edition of SQL Server you're using, since it is not available in Standard, Web, Workgroup, etc.
Upvotes: 1