Reputation: 9216

MySQL with huge tables: 2 queries faster then 1 (index not used)

I have two tables:

CREATE TABLE `linf` (
  `ID` bigint(20) NOT NULL AUTO_INCREMENT,
  `glorious` bit(1) DEFAULT NULL,
  `limad` varchar(127) COLLATE utf8_bin DEFAULT NULL,
  `linfDetails_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`ID`),
  KEY `FK242415D3B0D13C` (`linfDetails_id`),
  CONSTRAINT `FK242415D3B0D13C` FOREIGN KEY (`linfDetails_id`) REFERENCES `linfdetails` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=135111 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

(130K rows)

and

CREATE TABLE `messageentry` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `mboxOffset` bigint(20) DEFAULT NULL,
  `mboxOffsetEnd` bigint(20) DEFAULT NULL,
  `from_id` bigint(20) DEFAULT NULL,
  `linf_ID` bigint(20) DEFAULT NULL,
  `mailSourceFile_id` bigint(20) DEFAULT NULL,
  `messageDetails_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FKBBB258CB60B94D38` (`mailSourceFile_id`),
  KEY `FKBBB258CB11F9E114` (`from_id`),
  KEY `FKBBB258CBF7C835B8` (`messageDetails_id`),
  KEY `FKBBB258CBB10E8518` (`linf_ID`),
  CONSTRAINT `FKBBB258CBB10E8518` FOREIGN KEY (`linf_ID`) REFERENCES `linf` (`ID`),
  CONSTRAINT `FKBBB258CB11F9E114` FOREIGN KEY (`from_id`) REFERENCES `emailandname` (`id`),
  CONSTRAINT `FKBBB258CB60B94D38` FOREIGN KEY (`mailSourceFile_id`) REFERENCES `mailsourcefile` (`id`),
  CONSTRAINT `FKBBB258CBF7C835B8` FOREIGN KEY (`messageDetails_id`) REFERENCES `messagedetails` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5888892 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

(5M rows)

I need to find linf by linf.limad and then find all messages that correspond to this linf.

If I select it in two queries:

select sql_no_cache l.id from linf l where l.limad='test@';
select sql_no_cache me.* from messageentry me where me.linf_id = 118668;

then it takes 0.06 seconds.

If I use

select sql_no_cache me.* from messageentry me where me.linf_id in(
select l.id from linf l where l.limad='test@') ;

it takes 10 secs to execute. And this one:

select sql_no_cache me.* from messageentry me, linf l where me.linf_id=l.id
and l.limad='test@';

takes 4 seconds. (Times are stable)

This request retuns 0 results because there is no messages for this linf. In fact, I've stripped this from big request

select messageent1_.*
from
    MailSourceFile mailsource0_,        
    MessageEntry messageent1_ ,
    MessageDetails messagedet2_,    
    Linf linf3_
where
    messageent1_.messageDetails_id = messagedet2_.id
        and messageent1_.linf_ID = linf3_.ID
        and linf3_.limad = 'test@'
and mailsource0_.id = messageent1_.mailSourceFile_id

which works ~1 minute. Isn't that too much? Explain says that messageEntries index is not used:

mysql> explain select sql_no_cache me.* from messageentry me, linf l where me.linf_id=l.id and l.limad='test@';
+----+-------------+-------+--------+--------------------+---------+---------+------------------+---------+-------------+
| id | select_type | table | type   | possible_keys      | key     | key_len | ref              | rows    | Extra       |
+----+-------------+-------+--------+--------------------+---------+---------+------------------+---------+-------------+
|  1 | SIMPLE      | me    | ALL    | FKBBB258CBB10E8518 | NULL    | NULL    | NULL             | 5836332 |             |
|  1 | SIMPLE      | l     | eq_ref | PRIMARY            | PRIMARY | 8       | skryb.me.linf_ID |       1 | Using where |
+----+-------------+-------+--------+--------------------+---------+---------+------------------+---------+-------------+

Any ideas why? I've gained mysql ~1.6 G of memory and this should fit all tables.

Thanx.

Upvotes: 3

Answers (4)

newtover

Reputation: 32094

Let's look at the query:

select sql_no_cache me.*
from messageentry me, linf l
where me.linf_id=l.id
and l.limad='test@';

What does it do? According to the execution plan from the EXPLAIN for each row in me table it checks if there is a corresponding record in linf. Since you do not have any index on limad field, MySQL 5M times fetches the value of the limad field from disk (not from memory) to check if it is equal to '@test'. You say that the query returns 0 rows, but for another limad value that would give more rows it would need to go on disk for all me.* fields.

Ok, limad field is varchar(127) COLLATE utf8_bin, that is an index on it might be expesive (I would add it anyway). 130k rows is less than 5M, so it would be great to start with linf, and all we need from messageentry for the start is id, mailSourceFile_id, messageDetails_id. Why only those fields? Since we are going to make two more joins and we take no data from the joined tables, the tables seem to narrow the final result set, that is they are required for the skeleton of the query. Let's start with them only:

SELECT me.id, me.mailSourceFile_id, me.messageDetails_id
FROM (
  SELECT ID as linf_ID
  FROM linf
  WHERE limad='test@'
) as linf
JOIN messageentry me USING (linf_ID);

The query selects required linf_ID as and then for each found id looks apprepriate rows in messageentry. Since you have an index on linf_iD, the query should result faster than 4 secs.

But those me.mailSourceFile_id, me.messageDetails_id can't be taken from memory, since MySQL would need to do a complex index merge, hence, MySQL would anyway go on disk for each row with matching linf_ID. If you would have an index that contains all those three fields at once, the query would be even faster in case there is a significant amount of rows which are filtered by subsequent joins.

If you update your KEY FKBBB258CBB10E8518 (linf_ID) to FKBBB258CBB10E8518 (linf_ID, mailSourceFile_id, messageDetails_id), you would have such an index.

The resulting query would look something like:

SELECT me.*
FROM (
  SELECT ID as linf_ID
  FROM linf
  WHERE limad='test@'
) as linf
JOIN messageentry me USING (linf_ID)
JOIN MailSourceFile ms ON ms.id = me.mailSourceFile_id
JOIN MessageDetails md ON md.id = me.messageDetails_id;

Actually, as soon as you update the index FKBBB258CBB10E8518 (linf_ID) as suggested above, your original query most probably will have the same execution plan and timing as the last query.

Upvotes: 3

whunmr

Reputation: 2435

Try using INT instead BIGINT if possible, also choose INT for primary key if possible. The secondary index like "linf_ID" store it's related primary key in disk. using BIGINT means more page faults and disk reads. http://planet.mysql.com/entry/?id=13825
To reduce the index size of varchar, try index part of limad.
In book "High Performance Mysql 3Edition" give us a way to choose the length of varchar index. choose the length which let the following two sql's result similar

SELECT COUNT(DISTINCT city)/COUNT(*) FROM sakila.city_demo;

SELECT COUNT(DISTINCT LEFT(city, 3))/COUNT() AS sel3, COUNT(DISTINCT LEFT(city, 4))/COUNT() AS sel4, COUNT(DISTINCT LEFT(city, 5))/COUNT() AS sel5, COUNT(DISTINCT LEFT(city, 6))/COUNT() AS sel6, COUNT(DISTINCT LEFT(city, 7))/COUNT(*) AS sel7 FROM sakila.city_demo;
let MySQL analyze and optimize your data in disk http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html
For your 1minute run "big request" SQL in questions, to optimize this SQL, you need using multiple column index. http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

CREATE UNIQUE INDEX idx_name ON MessageEntry(linf_ID, messageDetails_id, mailSourceFile_id)

Upvotes: 0

akatakritos

Reputation: 9858

What happens if you explicitly define the join criteria, like so?

select sql_no_cache me.* 
from messageentry me JOIN linf l ON  me.linf_id=l.id
WHERE l.limad='test@';

You may get something screwy with your version if the optimizer chooses to do a cross join or something else weird.

Barring that, you might consider doing a force index:

select sql_no_cache me.* 
from messageentry me FORCE INDEX (FKBBB258CBB10E8518)
JOIN linf l ON  me.linf_id=l.id         
WHERE l.limad='test@';

This will at least show you if the index is actually going to help you or not.

Upvotes: 0

Gordon Linoff

Reputation: 1270391

MySQL does a very poor job with subqueries in in clauses, explaining the poor performance you see there. I suspect the join performance has to do with the ordering of the joins. It is probably reading the messages table in its entirety.

Try changing the in version to an exists:

select sql_no_cache me.*
from messageentry me
where exists (select 1 from linf l where l.limad='test@' and l.id = me.inf_id limit 1) ;

By the way, you should get used to doing joins in on clauses rather than in the where clause.

Upvotes: 0

MySQL with huge tables: 2 queries faster then 1 (index not used)

Answers (4)

Related Questions