Reputation:
I have following table structure:
I want to delete records from Users table which do not exist in SomeItems and SomeOtherItems tables.
I can do this:
DELETE from Users
FROM Users u
WHERE u.UserId NOT IN
(SELECT DISTINCT SomeId FROM SomeItems
UNION
SELECT DISTINCT SomeId2 FROM SomeOtherItems)
However, it is very slow. I assume it executes the UNION query for every record, doesn't it? Is there any way to improve the performance?
Upvotes: 0
Views: 277
Reputation: 1181
the simplest fix might just be to change UNION to UNION ALL
you'll see the effect of this by the removal of the stream aggregate component of the query plan.
after all you don't care if the list is duplicated
Upvotes: 2
Reputation: 6465
I would change the IN for two joins, the simpler the query, the easier is to optimize for the engine.
DELETE from U
FROM Users U
left join SomeItems S1 on S1.SomeId = U.UserId
left join SomeOtherItems S2 on S2.SomeID2 = U.UserId
WHERE S1.SomeId is null and S2.SomeID2 is null
Checking that S1.SomeID is null means that U.UserId was not present on SomeItems. Same for SomeOtherItems.
Be sure that you have indexes for SomeId and SomeID2 on SomeItems and SomeOtherItems.
Upvotes: 1
Reputation: 280439
Applying distinct on two results and then unioning them (which applies yet a third distinct, all requiring sorts) is not the most efficient way to validate existence. How about:
DELETE u
FROM dbo.Users AS u -- always use schema prefix!
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.SomeItems WHERE SomeId = u.UserId
)
AND NOT EXISTS
(
SELECT 1 FROM dbo.SomeOtherItems WHERE SomeID2 = u.UserId
);
Upvotes: 5