SQRCAT
SQRCAT

Reputation: 5840

INNER JOIN with NOT IN returns wrong results

I have a table product with fields such as title (varchar) or availability (bool).

The user can search for products by name (product.title), but can also apply certain filters (such as type, producer, etc.). I solved this using SELECT alongside INNER JOIN. An example of a generated query looks like this:

SELECT z.* 
FROM product z 
WHERE (z.title LIKE "%bread%") 
AND z.availability = 1 
AND z.category IN (4,5,6);

This works perfectly well. However, products also are tagged with "allergy tags" (such as gluten). Unfortunately, a product can hold any number of "allergy tags", thus i needed to use a crosslinking table (crosslink__productXproduct_allergy_tag):

id int not null primary key auto_increment
productId int
allergyTagID int

The tags are stored in table product_allergy_tag, using an id as primary key, and a title for the tag. Naturally, when searching for products that are gluten-free, the generated query would then look like this:

SELECT z.* 
FROM product z  
INNER JOIN crosslink__productXproduct_allergy_tag a ON z.id = a.productId  
WHERE (z.title LIKE "%bread%")
AND z.availability = 1  
AND a.allergyTagId IN (15)

(With 15 being the ID of the gluten tag).

However, if i ran the search in the same manner as shown above, a search for gluten-free products would end up showing only products that contain gluten.

So I figured: I simply need to reverse the logic:

Instead of:

a.allergyTagId IN (15)

it would be:

a.allergyTagId NOT IN (15)

Yet however, for reasons I do not understand, the query still returns some products with that allergy tag, although it does not return all of them.

I tried to get to the core of this by testing various options and comparing results, but to no avail.

Rumor has it, I am not an SQL wiz.

Any support on the matter would be helpful. Thank you!

Upvotes: 0

Views: 71

Answers (2)

Geoduck
Geoduck

Reputation: 9005

The problem is, the way a join works, it will join it with every match, so you'd get:

Product A - Allergen A Product A - Allergen B Product B - Allergen C

You also aren't getting any products that have NO allergens, since the INNER JOIN requires at least one allergen to be selected.

What you want is a JOIN and test that the allergen is NULL. Something like:

SELECT z.* FROM product z LEFT JOIN crosslink__productXproduct_allergy_tag a
  ON z.id = a.productId AND a.allergyTagId IN (15) WHERE (z.title LIKE 
 "%bread%") AND z.availability = 1 AND a.allergyTagId IS NULL

This does the allergen check at the time of the JOIN. If there are matches, it will include a row for each match with the allergyTagId set to the ID that matched, and if it doesn't match any, it will have NULL in that column. Then, filter on the NULL column to get products that don't hit on any allergens.

Upvotes: 1

Richard Theobald
Richard Theobald

Reputation: 777

SELECT z.* FROM product z where z.title like "%bread%" and z.availability = 1 and z.id not in (Select productId from crosslink__productXproduct_allergy_tag a where a.allergyTagId = 15)

Edit: What you want is the list of products excluding those in the list of products with allergytag of 15. The subquery finds the list of products with an allergytag of 15, and the primary query selects all of the ones that aren't those.

Upvotes: 2

Related Questions