markdwhite
markdwhite

Reputation: 2449

Inner join will not use index

Why would this query (and a number of similar variants) not use the index for ASIN on the 'tags' table? It insists on a full-table scan even when A contains just a few rows. As 'tags' table on production contains nearly a million entries, it's killing the query rather badly.

SELECT C.tag, count(C.tag) AS total
FROM
(
    SELECT B.*
    FROM
    (
        SELECT ASIN FROM requests WHERE user_id=9
    ) A
    INNER JOIN tags B USING(ASIN)
) C
GROUP BY C.tag ORDER BY total DESC 

EXPLAIN shows no index being used (run on test DB so rows in 'tags' is low, but still a full table scan):

|  1 | PRIMARY     | <derived2>         | system | NULL          | NULL        | NULL    | NULL |    0 | const row not found            |
|  2 | DERIVED     | <derived3>         | ALL    | NULL          | NULL        | NULL    | NULL |   28 |                                |
|  2 | DERIVED     | B                  | ALL    | NULL          | NULL        | NULL    | NULL | 2593 | Using where; Using join buffer |
|  3 | DERIVED     | borrowing_requests | ref    | idx_user_id   | idx_user_id | 5       |      |   27 | Using where        

Indexes:

| book_tags |          1 | asin           |            1 | ASIN        | A         |         432 |     NULL | NULL   |      | BTREE      |         |
| book_tags |          1 | idx_tag        |            1 | tag         | A         |        1296 |     NULL | NULL   |      | BTREE      |         |
| book_tags |          1 | idx_updated_on |            1 | updated_on  | A         |         518 |     NULL | NULL   |      | BTREE

The query was rewritten from an INNER JOIN which was having the same problem:

SELECT tag, count(tag) AS total 
FROM tags 
INNER JOIN requests ON requests.ASIN=tags.ASIN 
WHERE user_id=9 
GROUP BY tag 
ORDER BY total DESC

EXPLAIN:

|  1 | SIMPLE      | tags          | ALL  | NULL                 | NULL     | NULL    | NULL | 2593 | Using temporary; Using filesort |
|  1 | SIMPLE      | requests | ref  | idx_ASIN,idx_user_id | idx_ASIN | 33      | func |    3 | Using where 

I get the idea this is a real basic point I'm missing, but about 4 hours work on it has got me nowhere. Any advice is welcome.

EDIT:

I can see that the first query using sub-queries won't use indexes thanks to some replies, but this was being used as it ran twice as quick as the bottom query with just the INNER JOIN.

As an example, there are 70k rows in requests (all with an indexed ASIN), and 700k rows in tags, with 95k different ASINs in tags, each with less than 10 different tag records.

If a user has 10 requests, I only want the tags from those 10 ASINs to be listed and counted. In my mind, this should use tags.idx_ASIN and should lookup 100 rows (10 ASINs, each with max of 10 tags) at most from the tags table.

I'm missing something...I just can't see what.

EDIT:

requests CREATE TABLE:

CREATE TABLE IF NOT EXISTS `requests` (
  `bid` int(40) NOT NULL AUTO_INCREMENT,
  `user_id` int(20) DEFAULT NULL,
  `ASIN` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
  `status` enum('active','inactive','pending','deleted','completed') COLLATE     utf8_unicode_ci NOT NULL,
  `added_on` datetime NOT NULL,
  `status_changed_on` datetime NOT NULL,
  `last_emailed` datetime DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`bid`),
  KEY `idx_ASIN` (`ASIN`),
  KEY `idx_status` (`status`),
  KEY `idx_added_on` (`added_on`),
  KEY `idx_user_id` (`user_id`),
  KEY `idx_status_changed_on` (`status_changed_on`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=149380 ;

tags CREATE TABLE

CREATE TABLE IF NOT EXISTS `tags` (
  `ASIN` varchar(10) NOT NULL,
  `tag` varchar(50) NOT NULL,
  `updated_on` datetime NOT NULL,
  KEY `idx_tag` (`tag`),
  KEY `idx_updated_on` (`updated_on`),
  KEY `idx_asin` (`ASIN`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

There is no primary key on tags. I don't usually have tables without primary keys, but didn't see the need on this one. Could this be an issue?

AHA! Different charsets and collations. I shall correct that and try again!

Later:

That got it. Query went down from 10secs to 0.006secs. Thanks to everyone for getting me to look at this differently.

Upvotes: 2

Views: 2166

Answers (3)

markdwhite
markdwhite

Reputation: 2449

AHA! Different charsets and collations. I shall correct that and try again!

Later:

That got it. Query went down from 10secs to 0.006secs. Thanks to everyone for getting me to look at this differently.

Upvotes: 0

Bohemian
Bohemian

Reputation: 424993

Try reversing the order of the tables in your original query:

SELECT tag, count(tag) AS total 
FROM requests 
INNER JOIN tags ON requests.ASIN=tags.ASIN 
WHERE user_id=9 
GROUP BY tag 
ORDER BY total DESC

Upvotes: 0

ObscureRobot
ObscureRobot

Reputation: 7336

MySQL doesn't index subqueries. If you want indexes to improve performance of your queries, rewrite them to not use subqueries.

Upvotes: 1

Related Questions