user9073369
user9073369

Reputation:

DELETE and NOT IN query performance

I have following table structure:

Users

SomeItems

SomeOtherItems

I want to delete records from Users table which do not exist in SomeItems and SomeOtherItems tables.

I can do this:

DELETE from Users
FROM Users u
WHERE u.UserId NOT IN
   (SELECT DISTINCT SomeId FROM SomeItems

    UNION

    SELECT DISTINCT SomeId2 FROM SomeOtherItems)

However, it is very slow. I assume it executes the UNION query for every record, doesn't it? Is there any way to improve the performance?

Upvotes: 0

Views: 277

Answers (3)

john McTighe
john McTighe

Reputation: 1181

the simplest fix might just be to change UNION to UNION ALL

you'll see the effect of this by the removal of the stream aggregate component of the query plan.

after all you don't care if the list is duplicated

Upvotes: 2

Marc Guillot
Marc Guillot

Reputation: 6465

I would change the IN for two joins, the simpler the query, the easier is to optimize for the engine.

DELETE from U
FROM Users U
     left join SomeItems S1 on S1.SomeId = U.UserId
     left join SomeOtherItems S2 on S2.SomeID2 = U.UserId
WHERE S1.SomeId is null and S2.SomeID2 is null

Checking that S1.SomeID is null means that U.UserId was not present on SomeItems. Same for SomeOtherItems.

Be sure that you have indexes for SomeId and SomeID2 on SomeItems and SomeOtherItems.

Upvotes: 1

Aaron Bertrand
Aaron Bertrand

Reputation: 280439

Applying distinct on two results and then unioning them (which applies yet a third distinct, all requiring sorts) is not the most efficient way to validate existence. How about:

DELETE u
  FROM dbo.Users AS u -- always use schema prefix!
  WHERE NOT EXISTS
  (
    SELECT 1 FROM dbo.SomeItems WHERE SomeId = u.UserId
  )
  AND NOT EXISTS
  (
    SELECT 1 FROM dbo.SomeOtherItems WHERE SomeID2 = u.UserId
  );

Upvotes: 5

Related Questions