Pandiyan Cool
Pandiyan Cool

Reputation: 6565

How to delete large amount of data in SQL Server without data loss?

I've been dealing with million of data deletion in day to day process.

Basically I have 4 tables.

Table_A
Table_B
Table_C
Table_D

I'm attempting to delete data older than 10 days in all tables.

Possibly I will be deleting around a million in each table. I have created a stored procedure to do these operations.

The steps I have followed to delete the data is

Step 1: Move the recent days (data which I have to preserve) to a temp table

select * into Table_A_Temp
from Table_A
where <<where clause last 10 days to till date>>

Step 2: Rename main table to old table (table with all days data)

exec sp_rename 'Table_A', 'Table_A_Old'

Step 3: Rename temp table to main table (table with data between last days to till date)

exec sp_rename 'Table_A_temp', 'Table_A'

Step 4: Query the temp table with time frame if any new data is inserted during the copy process

Insert into Table_A
select * from Table_A_old

Step 5: Drop old tables

DROP TABLE Table_A_old

Step 6: Create keys and constraints in main table (means renamed table)

code to create primary keys and constraints

Problem:

If I continuously inserting data into table while the stored procedure is running, I'm losing data for a few seconds.

Case 1: While renaming table

when I rename the main to old and temp table to main

I'm getting invalid object error (that table is exist error)

Case 2: Two of my tables have foreign key relation

If I insert data before creating constraints and key I'm getting related errors.

How to handle and delete the data properly without losing data.

Please advice the best practices.

Upvotes: 2

Views: 1216

Answers (3)

Joachim Langezaal
Joachim Langezaal

Reputation: 195

I was thinking of an option to make insert / update and deleted trigger on the table_A. This trigger copies the data to #TEMP_Table. You can rename the two tables after 10 days

Step 1 create trigger to copy the data to new table "#TEMP_Table"

 CREATE TRIGGER trig_table_a_copy
 ON [Table_A]
 AFTER UPDATE
 AS BEGIN
     (excess code)       
 END

 AFTER INSERT
 AS BEGIN
  (excess code)
 END

 AFTER DELETE
 AS BEGIN
    (excess code)
 END
  GO

Step 2 wait 10 days. Check if two tables have the same data

Step 3 Rename the tables

exec sp_rename 'Table_A', 'Table_A_Old'
exec sp_rename '#TEMP_Table', 'Table_A'

Step 4 drop the old table.

DROP TABLE Table_A_old

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270391

Presumably, this is an on-going need.

What you should do is partition the tables. The place to being is to learn about partitioning in the documentation.

Probably the simplest method is the following:

  • Copy each table into a new location.
  • Redefine each table and add a partitioning function.
  • Load the data back into the tables.

For the third step, just load the most recent data that you want.

Then, in the future, you can simply drop the oldest partition each day. You can set up a SQL Server Agent job to do this and the system will work automatically.

Upvotes: 1

Tanner
Tanner

Reputation: 22743

To avoid losing data, I would just delete the records where they are as opposed to creating/renaming tables. SQL Server will deal with the inserts and deletions for you to prevent data loss. There's a few ways you can do this, below is one suggestion.

Step 1: Get the Id/Identifiers of the rows you wish to delete and store them in the temp table.

SELECT Id 
INTO #TEMP_Table_A_RowsToDelete
FROM Table_A
WHERE <<your_date_column <= 10 days ago>>

You will then have a temp table #TEMP_Table_A_RowsToDelete containing the Id values of the rows you are going to delete.

Step 2 (optional): Use this table to delete any related data in referenced tables if they exist.

DELETE t1
FROM ForeignKeyTable t1
INNER JOIN #TEMP_Table_A_RowsToDelete t2
  ON t1.[ForeignKeyColumn] = t2.Id

This will delete any related data in tables linked to the primary key on your main table. You will repeat this for all linked tables to ensure the foreign key constraint doesn't prevent you deleting rows in the parent table.

Step 3: Delete the rows from the main table.

DELETE t1
FROM Table_A t1
INNER JOIN #TEMP_Table_A_RowsToDelete t2
  ON t1.[Id] = t2.Id

This will delete the rows older than 10 days, based on what you added to the temp table in Step 1. This shouldn't cause issues if you've deleted all related data in Step 2.

You will need to repeat the above steps for each table in your list.

Upvotes: 1

Related Questions