stephenfrank
stephenfrank

Reputation: 2895

MySQL query to determine records that are related to category A and B

Here is a simplified version of my data:

products:
+----+-----------+
| id | name      |
+----+-----------+
|  1 | Product X |
|  2 | Product Y |
|  3 | Product Z |
+----+-----------+

categories:
+----+---------------+
| id | name          |
+----+---------------+
|  1 | Hotel         |
|  2 | Accommodation |
+----+---------------+

category_product
+----+------------+-------------+
| id | product_id | category_id |
+----+------------+-------------+
|  1 |          1 |           1 |
|  2 |          1 |           2 |
|  3 |          2 |           1 |
|  4 |          3 |           2 |
+----+------------+-------------+

How do I construct an efficient query that will only retrieve products that have both categories "Hotel" and "Accommodation" related (eg. Product X)?

I first tried a join approach

SELECT *
FROM products p
JOIN category_product cp
ON p.id = cp.product_id
WHERE cp.category_id = 1 OR cp.category_id = 2

^ This doesn't work because it doesn't contrain the query to containing both.

I have found an approach using sub-queries that works... but I've been warned against sub-queries for performance reasons:

SELECT *
FROM products p
WHERE
(
    SELECT id
    FROM category_product
    WHERE product_id = p.id
    AND category_id = 1
)
AND
(
    SELECT id
    FROM category_product
    WHERE product_id = p.id
    AND category_id = 2
)

Are there any better solutions (or how about alternatives)? I've considered de-normalizing categories to an extra column on products but would ideally like to avoid that. Hoping for a magic bullet solution!

UPDATE

I've run some of the (great) solutions provided in the answers: My data is 235 000 category_product rows and 58 000 products and obviously benchmarks are always dependent on environment and indexes etc.

"Relational division" @podiluska

2 categories: 2826 rows  ~ 20ms 
5 categories: 46 rows ~ 25-30 ms 
8 categories: 1 rows ~ 25-30 ms 

"Where exists" @Tim Schmelter

2 categories: 2826 rows  ~ 5-7ms 
5 categories: 46 rows ~ 30 ms 
8 categories: 1 rows ~ 300 ms 

One can see the results start to diverge with having a greater number of categories thrown in. I'll look at using "relational division" as it provides consistent results but implementation might cause me to look at "where exists" too (long format http://pastebin.com/6NRX0QbJ)

Upvotes: 2

Views: 342

Answers (5)

Mitch Satchwell
Mitch Satchwell

Reputation: 4830

SELECT p.id
FROM products p
JOIN category_product cp
ON p.id = cp.product_id
WHERE cp.category_id IN (1,2)
GROUP BY p.id
HAVING COUNT(DISTINCT cp.category_id) = 2

Upvotes: -1

Arun Killu
Arun Killu

Reputation: 14263

SELECT categories.name,products.name 
FROM 
category_product,category,product 
where 
    category_product.product_id=product.id 
and 
   category_product.category_id=category.id 
    and 
   (
      select count(1) from category_product 
      where 
      category_product.categoty_id=1
      or 
      category_product.categoty_id=2 
     group by product_id having count(1)=2
   )

Upvotes: 0

Tim Schmelter
Tim Schmelter

Reputation: 460298

I would use EXISTS:

SELECT P.* FROM Products P
WHERE EXISTS
(
    SELECT 1 FROM category_product cp
    WHERE cp.product_id = p.id
    AND category_id = 1
)
AND EXISTS
(
    SELECT 1 FROM category_product cp
    WHERE cp.product_id = p.id
    AND category_id = 2
)

Upvotes: 0

roman
roman

Reputation: 117561

select *
from products p
where
    (
        select
            count(distinct cp.category_id)
        from category_product as cp
        where
            cp.product_id = p.id and
            cp.category_id in (1, 2)
    ) = 2

or you can use exists

select *
from products p
where
    exists
    (
        select
            count(distinct cp.category_id)
        from category_product as cp
        where
            cp.product_id = p.id and
            cp.category_id in (1, 2)
        having count(distinct cp.category_id) = 2
    )

Upvotes: 0

podiluska
podiluska

Reputation: 51514

SELECT p.*
FROM products p
     inner join 
(
    select product_ID
    from category_product
    where category_id in (1,2)
    group by product_id
    having count(distinct category_id)=2
) pc
    on p.id = pc.product_id

This technique is called "relational division"

Upvotes: 4

Related Questions