Reputation: 15844
I have a dictionary in my database which has over million records and this simple select
select * from Word where languageId = 'en' order by rand() limit 1
randomly selects one word.
The problem is that this request lasts 3-4 seconds which is very long because I have to repeat it many times.
Is there a way to achieve the same thing but much faster?
EDIT - table schema
wordId - integer, auto increment
languageId - varchar (FK), values like cs, en, de, ...
word - varchar, word itself
Data structure example
wordId languageId word
--------------------------
1 cs abatyše
...
100000 cs zip
100001 en aardvark
...
etc
SQL
CREATE TABLE Language (
languageId VARCHAR(20) NOT NULL ,
name VARCHAR(255) NULL ,
PRIMARY KEY(languageId));
CREATE TABLE Word (
wordId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
languageId VARCHAR(20) NOT NULL ,
word VARCHAR(255) NULL ,
PRIMARY KEY(wordId) ,
INDEX Word_FK_Language(languageId),
FOREIGN KEY(languageId)
REFERENCES Language(languageId)
ON DELETE NO ACTION
ON UPDATE NO ACTION);
Upvotes: 1
Views: 104
Reputation: 16214
If you have an IDs column and the gaps between the elements are not huge (not too many elements were removed, otherwise some elements will be selected more often) then try this query
SELECT * FROM `table`
WHERE id >=
(SELECT FLOOR( MAX(id) * RAND()) FROM `table` WHERE languageId = 'en' )
AND languageId = 'en'
ORDER BY id LIMIT 1;
And look at different examples here http://akinas.com/pages/en/blog/mysql_random_row/
ps: I just realized that it works good only without requirement for the languageId, otherwise the gaps in IDs for the same languageId could be huge.
Updated Try this one, it could be in a couple of times faster. I checked it against execution time of your query.. twice faster..
SELECT d.* FROM
(SELECT @rn:=0 ) r,
(SELECT FLOOR(count(*)*RAND()) as rnd FROM `Word` WHERE languageId = 'en') t,
(SELECT @rn:=@rn+1 as rn, `Word`.* FROM `Word` WHERE languageId = 'en' ) d
WHERE d.rn >= t.rnd LIMIT 1
basically it still creates some kind of continuous ids, but without sorting by them.
Last Update This one could be even faster (depends on the random number generated)
SELECT d.* FROM
( SELECT @rn:=@rn+1 as rn, w.*, t.rnd rnd FROM
(SELECT @rn:=0 ) r,
(SELECT FLOOR(count(*)*RAND()) rnd FROM `Word` WHERE languageId = 'en') t,
`Word` w
WHERE w.languageId = 'en' AND @rn<t.rnd
) d
WHERE d.rn=d.rnd
Upvotes: 3
Reputation: 1011
You could partition the table by the first letter of the word, randomly pick a letter, and then use your existing sort to pick a random word within that partition. Sorting ~50,000 rows should be reasonably fast on a modern server. I think most database sorts are n lg(n) so 1/26th of the records should sort more than 50 times faster. The partition select should be negligible in terms of performance. On the other hand, fuzzyDunlop's comment about reusing the same list will still undoubtedly win out after 50 or so executions. Edit: I think I screwed up my log on the windows calc, so I'm going to go with: It should be more than 26 times faster ;)
Upvotes: 0
Reputation: 3387
Firstly, make sure your table is properly indexed. Does it have a primary key? Is languageId
an index? Make sure it is.
Secondly, are you only interested in the word, and not things like languageId
, or other fields in the table? If you are, you need this:
SELECT word_field FROM Word...
Wildcard SELECTs return everything, but you don't need to retrieve data you're never going to use.
Thirdly, are you just running the same query in a loop if you're repeating it many times? Change your LIMIT
statement to return more words in one query:
-- for 10 words
... LIMIT 10
You can store this result for later use without having to re-query the database.
Finally, you can run your query, but with EXPLAIN
in front of it to get an overview of what MySQL does when you run it.
EXPLAIN SELECT word_field FROM Word...
Using that, you can identify where exactly your query is running slowly.
Upvotes: 2