Reputation: 8118
i am running this query on a table that has half a million records with about 7 fields:
delete from qvalues where rid not in
(
select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)>1)
and rid not in (select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)=1);
why is it taking SO LONG?
what can i do to optimize it?
im running sql server 2008
Upvotes: 1
Views: 398
Reputation: 753
What is your machines setup, do you have enough memory, where do you see the most utilisation while the query runs (CPU, Memory, Disk IO) ?
Upvotes: 0
Reputation: 453648
As well as considering batching and indexing you can also rewrite the query itself to remove the sub queries and be more efficient.
;WITH cte As
(
SELECT ROW_NUMBER() OVER (PARTITION BY name, compound, rowid ORDER BY rid) AS RN
FROM qvalues
)
DELETE FROM cte WHERE RN > 1
How many duplicates per group will there be likely to be? If many it might be quicker to do an insert of the records to keep into a new table and then a drop and rename.
Upvotes: 1
Reputation: 65187
1 - Use batching. This lets you resume, and gives you an idea of progress. As an example:
DECLARE @MSG Varchar(max)
WHILE 1=1
BEGIN
DELETE TOP (100000) qvalues
FROM qvalues WITH (TABLOCKX)
<logic here>
IF @@ROWCOUNT < 100000 BREAK
SET @Msg = 'Deleted another 10 Million'
SET @Msg = @Msg + ' ' +CONVERT(varchar(20),GETDATE(),101)+' '+CONVERT(varchar(20),GETDATE(),108)
RAISERROR(@Msg, 0, 1) WITH NOWAIT
END
Note that I also added a WITH (TABLOCKX)
hint, which puts a table lock on and eliminates row-level locking. It'll cause issues with concurrent reads but hopefully you don't have anything else querying that table while you are deleting.
2 - Fix your logic This is impossible to write for you without a better idea of your table structure, but some options are:
- Materialize a table with the values you want to compare against and do a join. If the delete is big enough you can make a clustered index on the temp table on the join field. I've used this a lot with great success.
- If you expect to delete a large portion of the records, SELECT INTO
a new table and drop the old one. This is a minimally logged operation and runs really quickly on SQL Server 2008 compared to a delete, which needs to log the values for each row.
- Drop all your indexes but what you are using for selection and your clustered index. Keeping a clustered index is normally OK for a delete of this type if it's a relevant cluster to the query.
Upvotes: 1
Reputation: 21598
first thought:
delete from qvalues where rid not in
(
select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)>1
UNION
select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)=1);
Maybe its also a good idea to ensure, that the sql-server knows, that you are doing a "uncorrelated subselect" (because "correlated subselects" take much longer):
delete from qvalues a where a.rid not in
(
select min(b.rid) from qvalues b
group by b.name,b.compound,b.rowid
having COUNT(*)>1
UNION
select min(c.rid) from qvalues c
group by c.name,c.compound,c.rowid
having COUNT(*)=1);
and of course you should consider to use indexes (especially on rid, but also on name, compound, rowid)
My SQLs are not tested - I hope you get the idea of what I was trying to show.
PS: your sql requires a lot of calculations (especially the HAVING
clauses), could you try to find another solution for your problem?
Upvotes: 0
Reputation: 8972
Your best bet is to look at the execution plan and see what's taking the longest. I'd start by reducing the two not in
queries to one:
delete
from qvalues
where rid not in
(
select min(rid)
from qvalues
group by name, compound, rowid
having count(1) >= 1
)
You might also want to look into putting an index on name
, compound
and rowid
Upvotes: 2
Reputation: 2871
Not knowing the actual data involved, I can just give some general advice: Run each of the subqueries individually.
Also, am I reading this wrong, or are you deleting all but 2 entries from this table (if rid is unique)?
Upvotes: 1