Reputation: 2449
Why would this query (and a number of similar variants) not use the index for ASIN on the 'tags' table? It insists on a full-table scan even when A contains just a few rows. As 'tags' table on production contains nearly a million entries, it's killing the query rather badly.
SELECT C.tag, count(C.tag) AS total
FROM
(
SELECT B.*
FROM
(
SELECT ASIN FROM requests WHERE user_id=9
) A
INNER JOIN tags B USING(ASIN)
) C
GROUP BY C.tag ORDER BY total DESC
EXPLAIN shows no index being used (run on test DB so rows in 'tags' is low, but still a full table scan):
| 1 | PRIMARY | <derived2> | system | NULL | NULL | NULL | NULL | 0 | const row not found |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 28 | |
| 2 | DERIVED | B | ALL | NULL | NULL | NULL | NULL | 2593 | Using where; Using join buffer |
| 3 | DERIVED | borrowing_requests | ref | idx_user_id | idx_user_id | 5 | | 27 | Using where
Indexes:
| book_tags | 1 | asin | 1 | ASIN | A | 432 | NULL | NULL | | BTREE | |
| book_tags | 1 | idx_tag | 1 | tag | A | 1296 | NULL | NULL | | BTREE | |
| book_tags | 1 | idx_updated_on | 1 | updated_on | A | 518 | NULL | NULL | | BTREE
The query was rewritten from an INNER JOIN which was having the same problem:
SELECT tag, count(tag) AS total
FROM tags
INNER JOIN requests ON requests.ASIN=tags.ASIN
WHERE user_id=9
GROUP BY tag
ORDER BY total DESC
EXPLAIN:
| 1 | SIMPLE | tags | ALL | NULL | NULL | NULL | NULL | 2593 | Using temporary; Using filesort |
| 1 | SIMPLE | requests | ref | idx_ASIN,idx_user_id | idx_ASIN | 33 | func | 3 | Using where
I get the idea this is a real basic point I'm missing, but about 4 hours work on it has got me nowhere. Any advice is welcome.
EDIT:
I can see that the first query using sub-queries won't use indexes thanks to some replies, but this was being used as it ran twice as quick as the bottom query with just the INNER JOIN.
As an example, there are 70k rows in requests (all with an indexed ASIN), and 700k rows in tags, with 95k different ASINs in tags, each with less than 10 different tag records.
If a user has 10 requests, I only want the tags from those 10 ASINs to be listed and counted. In my mind, this should use tags.idx_ASIN and should lookup 100 rows (10 ASINs, each with max of 10 tags) at most from the tags table.
I'm missing something...I just can't see what.
EDIT:
requests CREATE TABLE:
CREATE TABLE IF NOT EXISTS `requests` (
`bid` int(40) NOT NULL AUTO_INCREMENT,
`user_id` int(20) DEFAULT NULL,
`ASIN` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
`status` enum('active','inactive','pending','deleted','completed') COLLATE utf8_unicode_ci NOT NULL,
`added_on` datetime NOT NULL,
`status_changed_on` datetime NOT NULL,
`last_emailed` datetime DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`bid`),
KEY `idx_ASIN` (`ASIN`),
KEY `idx_status` (`status`),
KEY `idx_added_on` (`added_on`),
KEY `idx_user_id` (`user_id`),
KEY `idx_status_changed_on` (`status_changed_on`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=149380 ;
tags CREATE TABLE
CREATE TABLE IF NOT EXISTS `tags` (
`ASIN` varchar(10) NOT NULL,
`tag` varchar(50) NOT NULL,
`updated_on` datetime NOT NULL,
KEY `idx_tag` (`tag`),
KEY `idx_updated_on` (`updated_on`),
KEY `idx_asin` (`ASIN`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
There is no primary key on tags. I don't usually have tables without primary keys, but didn't see the need on this one. Could this be an issue?
AHA! Different charsets and collations. I shall correct that and try again!
Later:
That got it. Query went down from 10secs to 0.006secs. Thanks to everyone for getting me to look at this differently.
Upvotes: 2
Views: 2166
Reputation: 2449
AHA! Different charsets and collations. I shall correct that and try again!
Later:
That got it. Query went down from 10secs to 0.006secs. Thanks to everyone for getting me to look at this differently.
Upvotes: 0
Reputation: 424993
Try reversing the order of the tables in your original query:
SELECT tag, count(tag) AS total
FROM requests
INNER JOIN tags ON requests.ASIN=tags.ASIN
WHERE user_id=9
GROUP BY tag
ORDER BY total DESC
Upvotes: 0
Reputation: 7336
MySQL doesn't index subqueries. If you want indexes to improve performance of your queries, rewrite them to not use subqueries.
Upvotes: 1