dotslashlu
dotslashlu

Reputation: 3401

Count statement in SQL slows down the query

item_tag_map has two column item_id and tag_id and both of them have index.

Here's a data sample:

item_id     tag_id
1           1
1           3
4           7
1           5
3           1
3           8
6           8
10          4

Now I want to obtain item ids which have tags 1,2,3,5 and sort the result by the total count of all tags.

Here's a result sample:

item_id     count(m.tag_id)
1           3
3           1

The SQL I tried was:

SELECT m.item_id,count(m.tag_id) from item_tag_map AS m
WHERE tag_id in(1,2,3,5)
GROUP BY m.item_id
ORDER BY count(m.tag_id)
LIMIT 10

There're about 10k rows in this table and the query was very slow. I tried to remove all count statement, then it became very fast than before.

Why would count slow down this query? How to optimize this query to make it fast?

Upvotes: 2

Views: 450

Answers (1)

Kaii
Kaii

Reputation: 20540

This is because of ORDER BY COUNT(m.tag_id).
MySQL needs to fetch all rows (ie. do a full table scan) to calculate the count for each value of item_id.

MySQL is not able to use the index in this case. (as you may realize when looking at EXPLAIN SELECT ..)

When you remove the COUNT() from the ORDER BY clause, MySQL is able to use the index for sorting.


One possible solution for this would be to create a materialized view, where the DBMS caches the count of tag_id values per item_id in a seperate table.

MySQL doesn't support materialized views natively, but you can simulate them:
You can initially create the table once using the query in question (INSERT INTO tag_counts SELECT ...) and then keep it updated using ON [INSERT | DELETE] triggers.
Alternatively, there is a third party software named FlexViews which automates this process for you.

That's how i kept my multi-million-rows-per-week statistics database reactive.

Upvotes: 4

Related Questions