First Namea A
First Namea A

Reputation: 21

Why my mysql answer that "not using key" when I use rand in where

I have a table that has 4,000,000 records. The table is created that : (user_id int, partner_id int, PRIMARY_KEY ( user_id )) engine=InnoDB; I want to test the performance of select 100 records. Then, I tested following:

mysql> explain select user_id from MY_TABLE use index (PRIMARY)  where user_id IN ( 1 );
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | PRIMARY     | MY_TABLE | const | PRIMARY       | PRIMARY | 4       | const |    1 | Using index |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
1 row in set, 1 warning (0.00 sec)

This is OK. But, this query is buffered by mysql. So, this test make no after the first test.

Then, I thinked of a sql that select by random value. I tested following:

mysql> explain select user_id from MY_TABLE use index (PRIMARY)  where user_id IN ( select ceil( rand() ) );
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows    | Extra                    |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
|  1 | PRIMARY     | MY_TABLE | index | NULL          | PRIMARY | 4       | NULL | 3998727 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+

But, it's bad. Explain shows that possible_keys is NULL. So, full index scanning is planned, and in fact, it's too slow rather than the one before.

Then, I want to ask you to teach me how do I write random value with index looking up.

Thanks

Upvotes: 2

Views: 177

Answers (3)

Bill Karwin
Bill Karwin

Reputation: 562378

It's a limitation of the MySQL optimizer, that it can't tell that the subquery returns exactly one value, it has to assume the subquery returns multiple rows with unpredictable values, potentially even all the values of user_id. Therefore it decides it's just going to do an index scan.

Here's a workaround:

mysql> explain select user_id from MY_TABLE use index (PRIMARY)  
where user_id = ( select ceil( rand() ) );

Note that MySQL's RAND() function returns a value in the range 0 <= v < 1.0. If you CEIL() it, you'll likely get the value 1. Therefore you'll virtually always get the row where user_id=1. If you don't have such a row in your table, you'll get an empty set result. You certainly won't get a user chosen randomly among all your users.

To fix that problem, you'd have to multiply the rand() by the number of distinct user_id values. And that brings up the problem that you might have gaps, so a randomly chosen value won't match any existing user_id.


Re your comment:

You'll always see possible keys as NULL when you get an index scan (i.e., "type" is "index").

I tried your explain query on a similar table, and it appears that the optimizer can't figure out that the subquery is a constant expression. You can workaround this limitation by calculating the random number in application code and then using the result as a constant value in your query:

select user_id from MY_TABLE use index (PRIMARY)  
where user_id = $random;

Upvotes: 0

johnnaras
johnnaras

Reputation: 139

See the definition of rand().

If i understand right, you are trying to get a random record from the database. If that is the case, again from the rand() definition:

ORDER BY RAND() combined with LIMIT is useful for selecting a random sample from a set of rows:

SELECT * FROM table1, table2 WHERE a=b AND c<d -> ORDER BY RAND() LIMIT 1000;

Upvotes: 0

Spudley
Spudley

Reputation: 168695

Using rand() in SQL is usually a sure-fire way to make the query slow. A common theme here is people using it in ORDER BY to get a random sequence. It's slow because not only does it throw away the indexes, but it also reads through the whole table.

However in your case, the fact that the function calls are in a sub-query ought to allow the outer query to still use its indexes. The fact that it isn't seems quite odd (so I've given the question a +1 vote).

My theory is that perhaps MySQL's optimiser is getting it wrong -- it's seeing the functions in the inner query, and deciding incorrectly that it can't use an index.

The only thing I can suggest to work around that is using force index to push MySQL into using the index you want.

Upvotes: 1

Related Questions