Denis O.
Denis O.

Reputation: 1850

Optimizing MySQL search query

Need yours help optimizing one mysql query. Lets take simple table for example.

CREATE TABLE `Modules` (
 `ID` int(11) NOT NULL AUTO_INCREMENT,
 `moduleName` varchar(100) NOT NULL,
 `menuName` varchar(255) NOT NULL,
PRIMARY KEY (`ID`),
KEY `moduleName` (`moduleName`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Lets Fill it with some data:

INSERT INTO  `Modules` (`moduleName` ,`menuName`)
VALUES 
    ('abc1',  'name1'), 
    ('abc',  'name2'), 
    ('ddf',  'name3'), 
    ('ccc',  'name4'), 
    ('fer',  'name5');

And some sample string. Let it be abc_def;

Traditionally we are trying to find all the rows containing search string.

On the contrary, my task is, to find all rows which contains moduleName in input string. For now I have following query to get desired result:

SELECT `moduleName` ,`menuName` 
FROM `Modules` 
WHERE 'abc_def' LIKE(CONCAT(`moduleName`,'%'))

This will return

moduleName   | menuName 
---------------------------
abc          | name2

The problem is, that this query is not using index.

Is there some way to force it to use one?

Upvotes: 14

Views: 5793

Answers (12)

Angelin Nadar
Angelin Nadar

Reputation: 9300

We can achieve with one funciton itself instead of 2 functions as SUBSTRING('abc_def', 1, LENGTH(moduleName))

where locate(moduleName, 'abc_def');

Upvotes: 0

LSerni
LSerni

Reputation: 57378

(previous part of answer deleted - see newtover's answer which is the same, but better, for that).

newtover's approach #2 (see SQLfiddle) will get you an index hit with a simple query, and should offer better performances on longer tables:

SELECT `moduleName`, `menuName` 
FROM `Modules` 
WHERE moduleName = LEFT('abc_def', LENGTH(moduleName));

If you need data from a lot of columns (instead of only menuName), i.e. if Modules is larger as well as longer, you might be better served by moving moduleName into a lookup table containing only an ID, the moduleName and its length (to save one function call).

The actual extra space needed is small, and if moduleName has a low cardinality, i.e., you have few moduleNames repeated along lots of menuNames, you might actually end up saving considerable space.

The new schema will be:

moduleName_id    integer, keys to Lookup.id
...all the fields in Modules except moduleName...


Lookup table
   id            primary key
   moduleName    varchar
   moduleLength  integer

and the query:

SELECT `Lookup`.`moduleName`,`menuName` 
FROM `Modules` INNER JOIN `Lookup`
    ON (`Modules`.`moduleName_id` = Lookup.id)
WHERE `Lookup`.`moduleName` = LEFT('abc_def',
         `Lookup`.`moduleLength`);

This SQLfiddle starts from your schema and modifies it to achieve the above. Speed and storage space improvements strongly depend on what data you put in the tables. I intentionally put myself in the best conditions (many short fields in Modules, an average of one hundred menuNames for each moduleName) and was able to save around 30% of storage space; the search performances were only around 3x as fast, and probably biased by I/O caching, so unless someone runs more thorough testing, I'd leave it at "appreciable space and time savings are possible".

On the other hand, on small, simple tables and equal number of menu and modules (i.e. 1:1), there will be a slight storage penalty for no appreciable speed gain. In that situation however spaces and times involved will be very small, so maybe the more "normalized" form above might still be the way to go, despite the added complexity.

Upvotes: 3

newtover
newtover

Reputation: 32084

You seem to misunderstand what is index and how it can help to speed up a query.

Let's look at what is your moduleName index. It is basically a sorted list of mappings from moduleName to ID. And what you are selecting?

SELECT moduleName, menuName 
FROM Modules
WHERE 'abc_def' LIKE CONCAT(moduleName,'%');

That is you want some two fields for each row that has some relation to a somehow mapped value of moduleName field. How can ever index help you? There is no exact match, and there is no way to take advantage from the fact that we have moduleNames sorted.

What you need to take advantage from the index, is to have a check for exact match in the condition:

SELECT moduleName, menuName 
FROM Modules
WHERE moduleName = LEFT('abc_def', LENGTH(moduleName));

Now we do have an exact match, but since the right part of the condition depends on the moduleName as well, this condition will be checked for each row. Since in his case MySQL can not predict how many rows will match, but it can predict that it will need randon disk access to fetch menuNames for each matching row, MySQL will not use the index.

So you have basically two approaches:

  1. if you know that the condition narrows the number of matching rows significantly, then you can just force the index
  2. another option is to extend your index to a covering composite index (moduleName, menuName), then all results for query will be fetched from the index directly (that is from memory).

Approach #2 (see SQLfiddle) will get you an index hit with a simple query, and should offer much better performances on a larger table. On small tables, I (i.e., lserni - see comment) don't think it's worth the effort.

Upvotes: 11

Angelin Nadar
Angelin Nadar

Reputation: 9300

Since, your dtabase engine is "InnoDB" All user data by default in InnoDB is stored in pages comprising a B-tree index

B-tree are good for following lookups:
● Exact full value (= xxx)
● Range of values (BETWEEN xx AND yy)
● Column prefix (LIKE 'xx%')
● Leftmost prefix

So, for your query, rather than using index or something to optimize, we can think of speeding up the query.

You can speed up the query by creating covering index .

A covering index refers to the case when all fields selected in a query are covered by an index, in that case InnoDB (not MyISAM) will never read the data in the table, but only use the data in the index, significantly speeding up the select. Note that in InnoDB the primary key is included in all secondary indexes, so in a way all secondary indexes are compound indexes. This means that if you run the following query on InnoDB:

SELECT `moduleName` ,`menuName` 
FROM `Modules1` 
WHERE 'abc_def' LIKE(CONCAT(`moduleName`,'%'))

MySQL will always use a covering index and will not access the actual table

To believe, go to **Explain**

What does Explain statement mean?

table: Indicates which table the output is affected.

type: Shows us which type of join is being used. From best to worst the types are: system, const, eq_ref, ref, range, index, all

possible_keys: Indicates which indices MySQL can choose from to find the rows in this table

key: Indicates the key (index) that MySQL actually decided to use. If MySQL decides to use one of the possible_keys indexes to look up rows, that index is listed as the key value.

key_len: It's the length of the key used. The shorter the better.

ref: Which column (or constant) is used

rows: The number of rows MySQL believes it must examine to execute the query.

extra Extra info: the bad ones to see here are "using temporary" and "using filesort"

I had 1,990 rows.

My Experiments:

I would recommend Isern's solution for where clause

    case 1) no indexes
explain select `moduleName` ,`menuName`  FROM `Modules1` WHERE moduleName = SUBSTRING('abc_def', 1, LENGTH(moduleName));
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | Modules | ALL  | NULL          | NULL | NULL    | NULL | 2156 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

Ways of creating covering indexes

case 2) ALTER TABLE `test`.`Modules1` ADD index `mod_name` (`moduleName`)

explain select `moduleName` ,`menuName`  FROM `Modules1` WHERE moduleName = SUBSTRING('abc_def', 1, LENGTH(moduleName));
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | Modules | ALL  | NULL          | NULL | NULL    | NULL | 2156 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+

Here, it shows index being used. See the columns: key, Extra

case 3) ALTER TABLE  `test`.`Modules1` DROP INDEX  `mod_name` ,
ADD INDEX  `mod_name` (  `moduleName` ,  `menuName` )

  explain select `moduleName` ,`menuName`  FROM `Modules1` WHERE moduleName = SUBSTRING('abc_def', 1, LENGTH(moduleName));
+----+-------------+----------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table    | type  | possible_keys | key      | key_len | ref  | rows | Extra                    |
+----+-------------+----------+-------+---------------+----------+---------+------+------+--------------------------+
|  1 | SIMPLE      | Modules | index | NULL          | mod_name | 1069    | NULL | 2066 | Using where; Using index |
+----+-------------+----------+-------+---------------+----------+---------+------+------+--------------------------+
1 row in set (0.00 sec)


case 4) ALTER TABLE  `test`.`Modules1` DROP INDEX  `mod_name` ,
ADD INDEX  `mod_name` (  `ID` ,  `moduleName` ,  `menuName` )

  explain select `moduleName` ,`menuName`  FROM `Modules1` WHERE moduleName = SUBSTRING('abc_def', 1, LENGTH(moduleName));
+----+-------------+----------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table    | type  | possible_keys | key      | key_len | ref  | rows | Extra                    |
+----+-------------+----------+-------+---------------+----------+---------+------+------+--------------------------+
|  1 | SIMPLE      | Modules | index | NULL          | mod_name | 1073    | NULL | 2061 | Using where; Using index |
+----+-------------+----------+-------+---------------+----------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

edit:

use where moduleName regexp "^(a|ab|abc|abc_|abc_d|abc_de|abc_def)$";
in place  of substring()

Upvotes: 4

Abhishek Salian
Abhishek Salian

Reputation: 934

Add index key to moduleName check http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html B-Tree Index Characteristics for more info

Not sure why you using LIKE, its always best to avoid it. My suggestion would be to get all the rows save it in JSON and then perform AJAX search on it.

Upvotes: 3

tanaydin
tanaydin

Reputation: 5296

like queries are not using indexes... but alternatively you can define an full text index for searching strings like this. but innodb engine is not supporting it, only myisam supports it.

Upvotes: 3

fthiella
fthiella

Reputation: 49049

I am not sure this is really a nice query, but it makes use of the index:

SELECT `moduleName` ,`menuName`
FROM `Modules` WHERE LEFT('abc_def', 7) = `moduleName`
UNION ALL
SELECT `moduleName` ,`menuName`
FROM `Modules` WHERE LEFT('abc_def', 6) = `moduleName`
UNION ALL
SELECT `moduleName` ,`menuName`
FROM `Modules` WHERE LEFT('abc_def', 5) = `moduleName`
UNION ALL
SELECT `moduleName` ,`menuName`
FROM `Modules` WHERE LEFT('abc_def', 4) = `moduleName`
UNION ALL
SELECT `moduleName` ,`menuName`
FROM `Modules` WHERE LEFT('abc_def', 3) = `moduleName`
UNION ALL
SELECT `moduleName` ,`menuName`
FROM `Modules` WHERE LEFT('abc_def', 2) = `moduleName`
UNION ALL
SELECT `moduleName` ,`menuName`
FROM `Modules` WHERE LEFT('abc_def', 1) = `moduleName`

General solution

And this is a general solution, using a dynamic query:

SET @search='abc_def';

SELECT
  CONCAT(
    'SELECT `moduleName` ,`menuName` FROM `Modules` WHERE ',
    GROUP_CONCAT(
      CONCAT(
        'moduleName=\'',
        LEFT(@search, ln),
        '\'') SEPARATOR ' OR ')
    )
FROM
  (SELECT DISTINCT LENGTH(moduleName) ln
   FROM Modules
   WHERE LENGTH(moduleName)<=LENGTH(@search)) s
INTO @sql;

This will create a string with a SQL query that has a condition WHERE moduleName='abc' OR moduleName='abc_' OR ... and it should be able to create the string quickly because of the index (if not, it can be improved a lot using a temporary indexed table with numbers from 1 to the maximum allowed length of your string, example in fiddle given). Then you can just execute the query:

PREPARE stmt FROM @sql;
EXECUTE stmt;

Please see fiddle here.

Upvotes: 3

Neo
Neo

Reputation: 5463

my answer may more complex

alter table Modules add column name_index int
alter table Modules add index name_integer_index(name_index);

when you insert to modules table you caculate a int value of moduleName, something like select ascii('a')

when run your query, you just need to run

SELECT `moduleName`, `menuName`
FROM   `Modules`
WHERE  name_index >
  (select ascii('a')) and name_index < (select ascii('abc_def'))

it will use name_integr_index

Upvotes: 3

Kickstart
Kickstart

Reputation: 21513

Similar to the suggestion by fthiella, but more flexible (as it can cope with longer string easily):-

SELECT DISTINCT `moduleName` ,`menuName`
FROM `Modules`
CROSS JOIN (SELECT a.i + b.i * 10 + c.i * 100 + 1 AS anInt FROM integers a, integers b, integers c) Sub1
WHERE LEFT('abc_def', Sub1.anInt) = `moduleName`

This (as typed) copes with string up to 1000 characters long but is slower than fthiellas solution. Can easily be cut down for strings up to 100 chars long at which point it seems marginally quicker than fthiellas solution.

Putting a check for length in it does speed it up a bit:-

SELECT SQL_NO_CACHE  DISTINCT `moduleName` ,`menuName`
FROM `Modules`
INNER JOIN (SELECT a.i + b.i * 10 + c.i * 100 + 1 AS anInt FROM integers a, integers b, integers c ) Sub1
ON Sub1.anInt <= LENGTH('abc_def') AND Sub1.anInt <= LENGTH(`moduleName`)
WHERE LEFT('abc_def', Sub1.anInt) = `moduleName`

Or with a slight amendment to bring the possible substrings back from the subselect:-

SELECT SQL_NO_CACHE  DISTINCT `moduleName` ,`menuName`
FROM `Modules`
CROSS JOIN (SELECT DISTINCT LEFT('abc_def', a.i + b.i * 10 + c.i * 100 + 1) AS aStart FROM integers a, integers b, integers c WHERE( a.i + b.i * 10 + c.i * 100 + 1) <= LENGTH('abc_def')) Sub1
WHERE aStart = `moduleName`

Note that these solutions depends on a table of integers with a single column and rows with the values 0 to 9.

Upvotes: 3

SteveP
SteveP

Reputation: 19093

You are effectively doing a regex on the field, so no key is going to work. However, in your example, you could make it a bit more efficient as each moduleName that matches must be less than or equal to 'abc_def', so you can add:

and moduleName <= 'abc_def'

The only other alternative I can think of is:

where modleName in ('a','ab','abc','abc_','abc_d','abc_de','abc_def')

Not pretty.

Upvotes: 7

Samina
Samina

Reputation: 154

DECLARE @SEARCHING_TEXT AS VARCHAR(500)

SET @SEARCHING_TEXT = 'ab'

SELECT 'moduleName' ,'menuName' FROM [MODULES] WHERE FREETEXT (MODULENAME, @SEARCHING_TEXT );

Upvotes: 3

Klas Lindb&#228;ck
Klas Lindb&#228;ck

Reputation: 33273

Try adding an index hint to your question.

SELECT `moduleName` ,`menuName` 
FROM `Modules` USE INDEX (col1_index,col2_index) 
WHERE 'abc_def' LIKE(CONCAT(`moduleName`,'%'))

Upvotes: 4

Related Questions