Reputation: 221
I'm having a hard time figuring out how to do an optimized query to do the following, even though it sounds simple.
Suppose I have a table called promo (one column: ID), and another table called promo_has_been_displayed_to_user (two columns: promo_id and user_id, and promo_id is a foreign key referencing promo.ID). I want a query that will return all the rows in promo whose ID field is NOT mentioned in any row in promo_has_been_displayed_to_user where the promo_has_been_displayed_to_user.user_id field is set to 45. Let's say I have indexes on all fields.
(The idea is that I have a database of promotional ads and a database of users, and each time I show an ad to a user, I store in promo_has_been_displayed_to_user that it has been shown to them. Now I want to find a new ad that has not been displayed to user 45.)
It seems the theoretically optimal way to do this would be the following:
1) Get a subset of promo_has_been_displayed_to_user where user_id=45, and in that subset, maintain the index on the user_id field. 2) For each row in promo, take the ID and look it up in the subset generated in step 1, on the indexed promo_id field. 3) Return all the rows in promo where you didn't find a match in step 2.
But how do I structure a query that reflects that?
Now, I have at least two queries that will return the right answer (I have verified with test data); the problem is that I don't think they will run optimally, for the following reasons:
1)
select * from promo
where ID not in (select promo_id from promo_has_been_displayed_to_user
where user_id=45)
The trouble with this query is that once you have the list of IDs returned by "select promo_id from promo_has_been_displayed_to_user where user_id=45", I assume it's just a list (without an index onit), and that the "not in" check is implemented by just checking that list one at a time. If the subset of promo_has_been_displayed_to_user where user_id=45 turns out to be huge, then for each row in promo, we have to search through a huge list with no index on it.
2)
select * from promo p
where not exists (select * from promo_has_been_displayed_to_user
where promo_id = p.ID and user_id=45)
This time, we are doing the lookup on the indexed promo_id field. However, for every row in promo, I'm querying the entire promo_has_been_displayed_to_user table. This is wasteful if there's only a small subset of promo_has_been_displayed_to_user where user_id=45.
Is there a single query that will combine the best of both worlds -- I first whittle down promo_has_been_displayed_to_user to the subset where user_id=45, and then for each row in promo, I do an indexed lookup on promo_id to see if there's a matching row in the subset?
(This is MySQL 5.0.95, although this sounds like something that isn't database-server-specific.)
Upvotes: 9
Views: 16157
Reputation: 26464
You cannot do this with an inner join. What you need to do is an antijoin. Usually this is most easily done using a query like this:
SELECT * FROM A WHERE id NOT IN (SELECT id FROM B);
That is the basic syntax of an antojoin in SQL.
An alternative way to do this however is to turn a LEFT JOIN into an antijoin and on some databases this performs better:
SELECT A.*
FROM A
LEFT JOIN B ON A.id = B.id
WHERE A.id IS NOT NULL AND B.id IS NULL;
This turns out to be equivalent and some databases can optimize it better.
Upvotes: 17