Query to exclude rows whose ID is listed in another table

Question

I'm having a hard time figuring out how to do an optimized query to do the following, even though it sounds simple.

Suppose I have a table called promo (one column: ID), and another table called promo_has_been_displayed_to_user (two columns: promo_id and user_id, and promo_id is a foreign key referencing promo.ID). I want a query that will return all the rows in promo whose ID field is NOT mentioned in any row in promo_has_been_displayed_to_user where the promo_has_been_displayed_to_user.user_id field is set to 45. Let's say I have indexes on all fields.

(The idea is that I have a database of promotional ads and a database of users, and each time I show an ad to a user, I store in promo_has_been_displayed_to_user that it has been shown to them. Now I want to find a new ad that has not been displayed to user 45.)

It seems the theoretically optimal way to do this would be the following:

1) Get a subset of promo_has_been_displayed_to_user where user_id=45, and in that subset, maintain the index on the user_id field. 2) For each row in promo, take the ID and look it up in the subset generated in step 1, on the indexed promo_id field. 3) Return all the rows in promo where you didn't find a match in step 2.

But how do I structure a query that reflects that?

Now, I have at least two queries that will return the right answer (I have verified with test data); the problem is that I don't think they will run optimally, for the following reasons:

1)

select * from promo
where ID not in (select promo_id from promo_has_been_displayed_to_user
                 where user_id=45)

The trouble with this query is that once you have the list of IDs returned by "select promo_id from promo_has_been_displayed_to_user where user_id=45", I assume it's just a list (without an index onit), and that the "not in" check is implemented by just checking that list one at a time. If the subset of promo_has_been_displayed_to_user where user_id=45 turns out to be huge, then for each row in promo, we have to search through a huge list with no index on it.

2)

select * from promo p
where not exists (select * from promo_has_been_displayed_to_user
                  where promo_id = p.ID and user_id=45)

This time, we are doing the lookup on the indexed promo_id field. However, for every row in promo, I'm querying the entire promo_has_been_displayed_to_user table. This is wasteful if there's only a small subset of promo_has_been_displayed_to_user where user_id=45.

Is there a single query that will combine the best of both worlds -- I first whittle down promo_has_been_displayed_to_user to the subset where user_id=45, and then for each row in promo, I do an indexed lookup on promo_id to see if there's a matching row in the subset?

(This is MySQL 5.0.95, although this sounds like something that isn't database-server-specific.)

Query to exclude rows whose ID is listed in another table

Answers (1)

Related Questions