Reputation: 3401
item_tag_map
has two column item_id
and tag_id
and both of them have index.
Here's a data sample:
item_id tag_id
1 1
1 3
4 7
1 5
3 1
3 8
6 8
10 4
Now I want to obtain item ids which have tags 1,2,3,5 and sort the result by the total count of all tags.
Here's a result sample:
item_id count(m.tag_id)
1 3
3 1
The SQL I tried was:
SELECT m.item_id,count(m.tag_id) from item_tag_map AS m
WHERE tag_id in(1,2,3,5)
GROUP BY m.item_id
ORDER BY count(m.tag_id)
LIMIT 10
There're about 10k rows in this table and the query was very slow. I tried to remove all count
statement, then it became very fast than before.
Why would count
slow down this query? How to optimize this query to make it fast?
Upvotes: 2
Views: 450
Reputation: 20540
This is because of ORDER BY COUNT(m.tag_id)
.
MySQL needs to fetch all rows (ie. do a full table scan) to calculate the count for each value of item_id
.
MySQL is not able to use the index in this case. (as you may realize when looking at EXPLAIN SELECT ..
)
When you remove the COUNT()
from the ORDER BY
clause, MySQL is able to use the index for sorting.
One possible solution for this would be to create a materialized view, where the DBMS caches the count of tag_id
values per item_id
in a seperate table.
MySQL doesn't support materialized views natively, but you can simulate them:
You can initially create the table once using the query in question (INSERT INTO tag_counts SELECT ...
) and then keep it updated using ON [INSERT | DELETE]
triggers.
Alternatively, there is a third party software named FlexViews which automates this process for you.
That's how i kept my multi-million-rows-per-week statistics database reactive.
Upvotes: 4