Reputation: 1335
I have a large set of data (ie 20 milions row of each table) which looks kind of like this (I'm not really dealing with products and catogories, but the same kind of situation)
Table products
|id|name|created_at|
--------------------
|1 |....|2018-06...|
|2 |....|2018-06...|
|3 |....|2018-06...|
|4 |....|2018-06...|
|5 |....|2018-06...|
...etc.
And a table of categories
|id|product_id|category|description|
------------------------------------
|1 |1 | abc | def |
|2 |1 | ghi | jkl |
|3 |1 | mno | pqr |
|4 |2 | abc | stu |
|5 |2 | wvx | yz |
...etc
What I want to do is a search, to find products with many categories, ie.
SELECT DISTINCTROW * FROM product WHERE
product.id in (
SELECT categories.product_id FROM categories WHERE category = 'abc'
)
AND
product.id in(
SELECT categories.product_id FROM categories WHERE category = 'ghi'
)
AND
product.id in(
SELECT categories.product_id FROM categories WHERE category= 'mno'
)
AND 'some extra where' ORDER BY product.created_au LIMIT 10 offset 0
But this is really slow... I've tried different approaches on this, but everyone takes at least 30 seconds.
I've made index of the columns used for joining.
So basicly I want to do a search where a product have to match one or many categories. Also, later on, I want to search so that a product matches a category and a description (might be from different rows in categories table).
Any ideas? Or perhaps links where I can read more about cases like this?
Upvotes: 0
Views: 438
Reputation: 133360
First suggestion you could use INNER JOIN on subquery instead of IN
SELECT DISTINCTROW *
FROM product p
INNER JOIN (
SELECT categories.product_id FROM categories WHERE category = 'abc'
) t1 on p.id = t1.product_id
INNER JOIN (
SELECT categories.product_id FROM categories WHERE category = 'ghi'
) t2 on p.id = t2.product_id
INNER JOIN (
SELECT categories.product_id FROM categories WHERE category= 'mno'
) t3 p.id = t3.product_id
WHERE 'some extra where'
ORDER BY product.created_au LIMIT 10 offset 0
You could also try using a single subquery for obtain all the product_id with the 3 category
SELECT DISTINCTROW *
FROM product p
INNER JOIN (
SELECT categories.product_id FROM categories WHERE category IN ( 'abc','ghi', 'mno')
group by categories.product_id
having count(distinct category ) = 3
) t1 on p.id = t1.product_id
WHERE 'some extra where'
ORDER BY product.created_au LIMIT 10 offset 0
If the inner join doesn't speed up as expected, then be sure you have proper composite index on categories table
CREATE INDEX index_name ON categories(category,product_id );
Upvotes: 1
Reputation: 1269503
I would simply use GROUP BY
and HAVING
to get the products:
select c.product_id
from categories c
where c.category in ('abc', 'ghi', 'mno')
group by c.product_id
having count(*) = 3;
You can use join
, exists
, or in
to get the rest of the product information:
select p.*
from products p join
(select c.product_id
from categories c
where c.category in ('abc', 'ghi', 'mno')
group by c.product_id
having count(*) = 3
) cp
on c.product_id = p.i
where . . . -- other conditions on product
Upvotes: 0
Reputation: 860
A temporary table should help speeding up things.
CREATE TEMPORARY TABLE tmpCategories( category_id INT)
Find a way to insert the ids of the category that you want to search for into this table.
Then write a join similar to this:
SELECT p.* FROM product p
INNER JOIN categories c ON p.product_id = c.product_id
INNER JOIN tmpCategories tc ON tc.catrgory_id = c.category_id
Your query might not be exactly like this. But this approach may speed things up.
P.S. I'm typing from my phone so pardon the formatting :)
Upvotes: 2