Reputation: 2895
Here is a simplified version of my data:
products:
+----+-----------+
| id | name |
+----+-----------+
| 1 | Product X |
| 2 | Product Y |
| 3 | Product Z |
+----+-----------+
categories:
+----+---------------+
| id | name |
+----+---------------+
| 1 | Hotel |
| 2 | Accommodation |
+----+---------------+
category_product
+----+------------+-------------+
| id | product_id | category_id |
+----+------------+-------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 3 | 2 |
+----+------------+-------------+
How do I construct an efficient query that will only retrieve products
that have both categories "Hotel" and "Accommodation" related (eg. Product X)?
I first tried a join approach
SELECT *
FROM products p
JOIN category_product cp
ON p.id = cp.product_id
WHERE cp.category_id = 1 OR cp.category_id = 2
^ This doesn't work because it doesn't contrain the query to containing both.
I have found an approach using sub-queries that works... but I've been warned against sub-queries for performance reasons:
SELECT *
FROM products p
WHERE
(
SELECT id
FROM category_product
WHERE product_id = p.id
AND category_id = 1
)
AND
(
SELECT id
FROM category_product
WHERE product_id = p.id
AND category_id = 2
)
Are there any better solutions (or how about alternatives)? I've considered de-normalizing categories to an extra column on products but would ideally like to avoid that. Hoping for a magic bullet solution!
UPDATE
I've run some of the (great) solutions provided in the answers: My data is 235 000 category_product rows and 58 000 products and obviously benchmarks are always dependent on environment and indexes etc.
"Relational division" @podiluska
2 categories: 2826 rows ~ 20ms
5 categories: 46 rows ~ 25-30 ms
8 categories: 1 rows ~ 25-30 ms
"Where exists" @Tim Schmelter
2 categories: 2826 rows ~ 5-7ms
5 categories: 46 rows ~ 30 ms
8 categories: 1 rows ~ 300 ms
One can see the results start to diverge with having a greater number of categories thrown in. I'll look at using "relational division" as it provides consistent results but implementation might cause me to look at "where exists" too (long format http://pastebin.com/6NRX0QbJ)
Upvotes: 2
Views: 342
Reputation: 4830
SELECT p.id
FROM products p
JOIN category_product cp
ON p.id = cp.product_id
WHERE cp.category_id IN (1,2)
GROUP BY p.id
HAVING COUNT(DISTINCT cp.category_id) = 2
Upvotes: -1
Reputation: 14263
SELECT categories.name,products.name
FROM
category_product,category,product
where
category_product.product_id=product.id
and
category_product.category_id=category.id
and
(
select count(1) from category_product
where
category_product.categoty_id=1
or
category_product.categoty_id=2
group by product_id having count(1)=2
)
Upvotes: 0
Reputation: 460298
I would use EXISTS
:
SELECT P.* FROM Products P
WHERE EXISTS
(
SELECT 1 FROM category_product cp
WHERE cp.product_id = p.id
AND category_id = 1
)
AND EXISTS
(
SELECT 1 FROM category_product cp
WHERE cp.product_id = p.id
AND category_id = 2
)
Upvotes: 0
Reputation: 117561
select *
from products p
where
(
select
count(distinct cp.category_id)
from category_product as cp
where
cp.product_id = p.id and
cp.category_id in (1, 2)
) = 2
or you can use exists
select *
from products p
where
exists
(
select
count(distinct cp.category_id)
from category_product as cp
where
cp.product_id = p.id and
cp.category_id in (1, 2)
having count(distinct cp.category_id) = 2
)
Upvotes: 0
Reputation: 51514
SELECT p.*
FROM products p
inner join
(
select product_ID
from category_product
where category_id in (1,2)
group by product_id
having count(distinct category_id)=2
) pc
on p.id = pc.product_id
This technique is called "relational division"
Upvotes: 4