Reputation: 745
I want to delete records from Table1 (Rows > 300 million records) based on this criteria (returns > 100 million records):
Column1 IS NULL AND Column2 IS NULL AND Column3 IS NULL AND Column4 IS NULL AND Column5 IS NULL AND Column6 IS NULL AND Column7 IS NULL
Table definition:
CREATE TABLE [dbo].[Table1](
[ID] [uniqueidentifier] NOT NULL,
[Column1] [float] NULL,
[Column2] [float] NULL,
[Column3] [float] NULL,
[Column4] [float] NULL,
[Column5] [float] NULL,
[Column6] [float] NULL,
[Column7] [float] NULL,
[Table2ID] [uniqueidentifier] NULL,
PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Note: Table2ID is the foreign key contraint from Table2
CREATE TABLE [dbo].[Table2](
[ID] [uniqueidentifier] NOT NULL,
[Column1] [date] NULL,
[Column2] [tinyint] NULL,
[Column3] [tinyint] NULL,
CONSTRAINT [PK_Table2_ID] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
I also want to delete all orphan records from Table2 (Row count around 10 million records) which are no longer referenced in Table1. Here are the approaches which I took:
a. Create temp table #table1 with Table1ID and Table2ID columns. Capture all relevant Table1IDs and run the below delete query.
Delete from Table1 where ID in (select Table1ID from #table1)
Delete from Table2 where ID in (select Table2ID from #table1)
The above query is taking enormous time possibly because of table scans.
b. Everything same as above but create unique clustered index on Table1ID column and unique index on Table2ID in #table1 and run the below query:
Delete from Table1 t1 join #table1 tmp1 on t1.ID=tmp1.Table1ID
Delete from Table2 t2 join #table1 tmp1 on t1.ID=tmp1.Table2ID
Is there a better way to handle this scenario? What is the best possible way to delete records from Table1 and Table2?
Note: I am aware of the approach where we create a new table, dump all the relevant data and rename it. Please provide suggestions on alternative approaches - pros and cons.
Upvotes: 3
Views: 2527
Reputation: 5120
If I uderstand correctly, filtered index and batch deletion should help (SQL Server 2008 or later):
create index IX_1 on dbo.Table1 (Column1, Column2, ...) include (Table2ID) where Column1 is NULL and Column2 is NULL and ...
create table #Deleted (Table2ID uniqueidentifier NULL)
create index #IX_Deleted on #Deleted (Table2ID) where Table2ID is not NULL
select 1
while @@rowcount > 0
delete top (50000) T
output deleted.Table2ID into #Deleted (Table2ID)
from dbo.Table1 T
where Column1 is NULL and Column2 is NULL and ...
select 1
while @@rowcount > 0
delete top (50000) T
from dbo.Table2 T
join (select distinct Table2ID from #Deleted where Table2ID is not NULL) D on D.Table2ID = T.ID
drop table #Deleted
drop index IX_1 on dbo.Table1
And, depending on need (one time operation, or periodical), index IX_1
can be kept.
On SQL Server 2005 the same can be done with an indexed computed column.
Upvotes: 0
Reputation: 2019
As You really want to delete high percentage, so you may consider this approach
SELECT col1, col2, ... INTO #SomeHoldingtable
FROM MyTable WHERE ..condition..
TRUNCATE TABLE MyTable
INSERT MyTable (col1, col2, ...)
SELECT col1, col2, ... FROM #SomeHoldingtable
Upvotes: 4