Should I use derived table in this situation?

Question

I need to fetch 10 random rows from a table, the query below will not do it as it is going to be very slow on a large scale (I've read strong arguments against it):

SELECT `title` FROM table1 WHERE id1 = 10527 and id2 = 37821 ORDER BY RAND() LIMIT 10;

EXPLAIN:
select_type |    table    | type | possible_keys |  key  | key_len | ref  | rows | Extra          |
------------+-------------+------+---------------+-------+---------+------+------+----------------+ 
SIMPLE      |  table1     |  ref | id1,id2       |  id2  | 5       | const|  7   | Using where; Using temporary; Using filesort

I tried the following workaround:

SELECT * FROM
(SELECT `title`, RAND() as n1
FROM table1
WHERE id1 = 10527 and id2 = 37821) TTA
ORDER BY n1 LIMIT 10;

EXPLAIN:
select_type |    table    | type | possible_keys |  key  | key_len | ref  | rows | Extra          |
------------+-------------+------+---------------+-------+---------+------+------+----------------+ 
PRIMARY     |   |  ALL | NULL          |  NULL | NULL    | NULL |  7   | Using filesort |
DERIVED     |    table1   |  ref | id1,id2       |  id2  |   5     |const |  7   | Using where    |

But I’ve read also couple of statements against using derived tables.

Could you please tell me if the latter query is going to make any improvement?

Gordon Linoff · Accepted Answer

You should try the first method to see if it works for you. If you have an index on table1(id1, id2) and there are not very many occurrences of any given value pair, then the performance is probably fine for what you want to do.

Your second query is going to have somewhat worse performance than the first. The issue with the performance of order by rand() is not the time taken to calculate random numbers. The issue is the order by, and your second query is basically doing the same thing, with the additional overhead of a derived table.

If you know that there were always at least, say, 1000 matching values, then the following would generally work faster:

SELECT `title`
FROM table1
WHERE id1 = 10527 and id2 = 37821 and rand() < 0.05
ORDER BY RAND()
LIMIT 10;

This would take a random sample of about 5% of the data and with 1,000 matching rows, you would almost always have at least 10 rows to choose from.

Should I use derived table in this situation?

Answers (1)

Related Questions