user2557039
user2557039

Reputation: 71

increase speed of database query with mysql

I have the following query for a type-ahead search (as you type into the form it displays matches in a drop down). This query worked well until I switched to a database with about a million records. Now it takes 15 seconds for the match to be displayed. Because search hits are displayed as you type, the query is inside a loop. Is there anything about this query that can be changed to speed it up?

$diagnosis = isset($_GET['diagnosis']) ? $_GET['diagnosis'] : '';

$data = array();

if ($diagnosis) {
$query = explode(' ', $diagnosis);

for ($i = 0, $c = count($query); $i < $c; $i ++) {
    $query[$i] = '+' . mysql_real_escape_string($query[$i]) . '*';
}

$query = implode(' ', $query);

$sql = "SELECT diagnosis, icd9, MATCH(diagnosis) AGAINST('$query' IN BOOLEAN MODE) AS relevance 
        FROM icd10 WHERE MATCH(diagnosis) AGAINST('$query' IN BOOLEAN MODE) HAVING relevance > 0 ORDER BY relevance ";

$r = mysql_query($sql);

    while ($row = mysql_fetch_array($r)) {
        $data[] = $row;
    }
}

echo json_encode($data);
exit;

Upvotes: 0

Views: 512

Answers (1)

Solarflare
Solarflare

Reputation: 11106

You can try some stuff:

First, make sure you have a fulltext index for diagnosis. Second, make sure you have a fulltext index for diagnosis! A million rows isn't that much (depending on the number of words in diagnosis of course), so that just already might be the problem.

Then try the following code:

SELECT diagnosis, icd9, MATCH(diagnosis) AGAINST('$query' IN BOOLEAN MODE) AS relevance 
FROM icd10 ORDER BY relevance desc limit 30

(It might not be obvious that this is faster, and it might not be, so just try it).

If you need to support short words, e.g. if 3 digit icd9-codes are entered often, you should check your ft_min_word_len / innodb_ft_min_token_size-values (depending on your database) to make sure they are included in the index - but be aware it will increase your index size. Maybe check the stopwords.

You didn't specify your setup; you can often improve general database performance by e.g. changing settings, hdds or ram. Especially ram.

Some general ideas: You might want to call the function asynchronously (the user should be able to type while the query runs). As soon as you hit less than 30 results (or whatever limit you set), you can just filter the remaining results on the fly in php (as long as the query gets longer/no words are removed) - it's the closest you get to a cache. Or set the limit to 1000 and filter manually afterwards, php regex is fast too, you just need a score-function. Depending on your data, you might want to not run the query when you just add a single letter to the query (every text will contain a word beginning with an "a", so you might not get a better result - that might not be the case for "q" though). That won't reduce runtime of the query, but you can just save one execution.

Upvotes: 1

Related Questions