JOE SKEET
JOE SKEET

Reputation: 8118

20 hour query from hell

i am running this query on a table that has half a million records with about 7 fields:

delete from qvalues where rid not in  
(
select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)>1)

and rid not in  (select min(rid) from qvalues

group by name,compound,rowid
having COUNT(*)=1);

why is it taking SO LONG?

what can i do to optimize it?

im running sql server 2008

Upvotes: 1

Views: 398

Answers (6)

nojetlag
nojetlag

Reputation: 753

What is your machines setup, do you have enough memory, where do you see the most utilisation while the query runs (CPU, Memory, Disk IO) ?

Upvotes: 0

Martin Smith
Martin Smith

Reputation: 453648

As well as considering batching and indexing you can also rewrite the query itself to remove the sub queries and be more efficient.

;WITH cte As
(
SELECT ROW_NUMBER() OVER (PARTITION BY name, compound, rowid ORDER BY rid) AS RN
FROM qvalues
)
DELETE FROM cte WHERE RN > 1

How many duplicates per group will there be likely to be? If many it might be quicker to do an insert of the records to keep into a new table and then a drop and rename.

Upvotes: 1

JNK
JNK

Reputation: 65187

1 - Use batching. This lets you resume, and gives you an idea of progress. As an example:

DECLARE @MSG Varchar(max)

WHILE 1=1
    BEGIN
        DELETE TOP (100000) qvalues
        FROM qvalues WITH (TABLOCKX)
        <logic here>
        IF @@ROWCOUNT < 100000 BREAK        
        SET @Msg = 'Deleted another 10 Million'
        SET @Msg = @Msg + ' ' +CONVERT(varchar(20),GETDATE(),101)+' '+CONVERT(varchar(20),GETDATE(),108) 
        RAISERROR(@Msg, 0, 1) WITH NOWAIT
    END

Note that I also added a WITH (TABLOCKX) hint, which puts a table lock on and eliminates row-level locking. It'll cause issues with concurrent reads but hopefully you don't have anything else querying that table while you are deleting.

2 - Fix your logic This is impossible to write for you without a better idea of your table structure, but some options are: - Materialize a table with the values you want to compare against and do a join. If the delete is big enough you can make a clustered index on the temp table on the join field. I've used this a lot with great success. - If you expect to delete a large portion of the records, SELECT INTO a new table and drop the old one. This is a minimally logged operation and runs really quickly on SQL Server 2008 compared to a delete, which needs to log the values for each row. - Drop all your indexes but what you are using for selection and your clustered index. Keeping a clustered index is normally OK for a delete of this type if it's a relevant cluster to the query.

Upvotes: 1

slartidan
slartidan

Reputation: 21598

first thought:

delete from qvalues where rid not in  
(
select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)>1

UNION

select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)=1);

Maybe its also a good idea to ensure, that the sql-server knows, that you are doing a "uncorrelated subselect" (because "correlated subselects" take much longer):

delete from qvalues a where a.rid not in  
(
select min(b.rid) from qvalues b
group by b.name,b.compound,b.rowid
having COUNT(*)>1

UNION

select min(c.rid) from qvalues c
group by c.name,c.compound,c.rowid
having COUNT(*)=1);

and of course you should consider to use indexes (especially on rid, but also on name, compound, rowid)

My SQLs are not tested - I hope you get the idea of what I was trying to show.

PS: your sql requires a lot of calculations (especially the HAVING clauses), could you try to find another solution for your problem?

Upvotes: 0

Dave D
Dave D

Reputation: 8972

Your best bet is to look at the execution plan and see what's taking the longest. I'd start by reducing the two not in queries to one:

delete 
from qvalues 
where rid not in
(
    select min(rid)
    from qvalues
    group by name, compound, rowid
    having count(1) >= 1
)

You might also want to look into putting an index on name, compound and rowid

Upvotes: 2

Jeff
Jeff

Reputation: 2871

Not knowing the actual data involved, I can just give some general advice: Run each of the subqueries individually.

Also, am I reading this wrong, or are you deleting all but 2 entries from this table (if rid is unique)?

Upvotes: 1

Related Questions