user219882
user219882

Reputation: 15844

MySQL query performance optimization

I have a dictionary in my database which has over million records and this simple select

select * from Word where languageId = 'en' order by rand() limit 1

randomly selects one word.

The problem is that this request lasts 3-4 seconds which is very long because I have to repeat it many times.

Is there a way to achieve the same thing but much faster?

EDIT - table schema

wordId - integer, auto increment
languageId - varchar (FK), values like cs, en, de, ... 
word - varchar, word itself

Data structure example

wordId   languageId   word
--------------------------
1        cs           abatyše
...
100000   cs           zip
100001   en           aardvark
...
etc

SQL

CREATE TABLE Language (
  languageId VARCHAR(20)  NOT NULL  ,
  name VARCHAR(255)  NULL    ,
PRIMARY KEY(languageId));

CREATE TABLE Word (
  wordId INTEGER UNSIGNED  NOT NULL   AUTO_INCREMENT,
  languageId VARCHAR(20)  NOT NULL  ,
  word VARCHAR(255)  NULL    ,
PRIMARY KEY(wordId)  ,
INDEX Word_FK_Language(languageId),
  FOREIGN KEY(languageId)
    REFERENCES Language(languageId)
      ON DELETE NO ACTION
      ON UPDATE NO ACTION);

Upvotes: 1

Views: 104

Answers (3)

Cheery
Cheery

Reputation: 16214

If you have an IDs column and the gaps between the elements are not huge (not too many elements were removed, otherwise some elements will be selected more often) then try this query

SELECT * FROM `table` 
   WHERE id >= 
      (SELECT FLOOR( MAX(id) * RAND()) FROM `table` WHERE languageId = 'en' ) 
   AND languageId = 'en'
   ORDER BY id LIMIT 1;

And look at different examples here http://akinas.com/pages/en/blog/mysql_random_row/

ps: I just realized that it works good only without requirement for the languageId, otherwise the gaps in IDs for the same languageId could be huge.

Updated Try this one, it could be in a couple of times faster. I checked it against execution time of your query.. twice faster..

SELECT d.* FROM
  (SELECT @rn:=0 ) r, 
  (SELECT FLOOR(count(*)*RAND()) as rnd FROM `Word` WHERE languageId = 'en') t,  
  (SELECT @rn:=@rn+1 as rn, `Word`.* FROM `Word` WHERE languageId = 'en' ) d 
WHERE d.rn >= t.rnd LIMIT 1

basically it still creates some kind of continuous ids, but without sorting by them.

Last Update This one could be even faster (depends on the random number generated)

SELECT d.* FROM
  ( SELECT @rn:=@rn+1 as rn, w.*, t.rnd rnd FROM 
     (SELECT @rn:=0 ) r, 
     (SELECT FLOOR(count(*)*RAND()) rnd FROM `Word` WHERE languageId = 'en') t, 
     `Word` w 
   WHERE w.languageId = 'en' AND @rn<t.rnd
  ) d 
WHERE d.rn=d.rnd

Upvotes: 3

Tim
Tim

Reputation: 1011

You could partition the table by the first letter of the word, randomly pick a letter, and then use your existing sort to pick a random word within that partition. Sorting ~50,000 rows should be reasonably fast on a modern server. I think most database sorts are n lg(n) so 1/26th of the records should sort more than 50 times faster. The partition select should be negligible in terms of performance. On the other hand, fuzzyDunlop's comment about reusing the same list will still undoubtedly win out after 50 or so executions. Edit: I think I screwed up my log on the windows calc, so I'm going to go with: It should be more than 26 times faster ;)

Upvotes: 0

leemeichin
leemeichin

Reputation: 3387

Firstly, make sure your table is properly indexed. Does it have a primary key? Is languageId an index? Make sure it is.

Secondly, are you only interested in the word, and not things like languageId, or other fields in the table? If you are, you need this:

SELECT word_field FROM Word...

Wildcard SELECTs return everything, but you don't need to retrieve data you're never going to use.

Thirdly, are you just running the same query in a loop if you're repeating it many times? Change your LIMIT statement to return more words in one query:

-- for 10 words
... LIMIT 10

You can store this result for later use without having to re-query the database.

Finally, you can run your query, but with EXPLAIN in front of it to get an overview of what MySQL does when you run it.

EXPLAIN SELECT word_field FROM Word...

Using that, you can identify where exactly your query is running slowly.

Upvotes: 2

Related Questions