caner taşdemir
caner taşdemir

Reputation: 203

mySQL - Should I prefer REGEXP for faster search results?

I have a query which searches always 3 values in a spesified row. There are too many records in the table. I heard on the forums that REGEXP is a bad choice. I am confused, and would like to hear your suggestions for faster search results.

SELECT shareit,id FROM shares where from='".$nick."' and shareit 
REGEXP 'youtube|dailymotion|vimeo' ORDER BY id DESC

Upvotes: 0

Views: 101

Answers (2)

Dave Thomas
Dave Thomas

Reputation: 3837

REGEXP matching will always be slower than direct equality comparison. Take for example the amount of code that makes up regular expression parsing for mysql: MySQL Open Source Regex Code

Take a look at what it takes to do a regular expression, and consider the complexity of doing the operation this way. Extra reference: https://softwareengineering.stackexchange.com/questions/122440/how-do-regular-expressions-actually-work

When instead doing a direct equality comparison, more optimizations can be made, and the operation can be less costly.

How much slower is regex? Well that is subjective. You can run some benchmarks and choose the solution that makes sense to you. Sometimes you need the complexity of regular expressions, and the costs are negligible.

In my opinion if you are going for an exact match, do the extra typing, and go with:

SELECT shareit,id FROM shares where from='".$nick."' and 
(shareit = 'youtube' OR shareit = 'dailymotion' OR shareit = 'vimeo') 
ORDER BY id DESC

If you aren't looking for an exact match then use the use the like query from Gordon Linoff's answer. Also important to note, an index on the columns you are comparing against would be your best option for performance gains.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270713

This is your query:

SELECT shareit, id
FROM shares 
WHERE from = '".$nick."' and
      shareit REGEXP 'youtube|dailymotion|vimeo'
ORDER BY id DESC;

Regular expressions follow different rules from LIKE. So, your query is equivalent to:

SELECT shareit, id
FROM shares 
WHERE `from` = '".$nick."' and
      (shareit LIKE '%yourtube%' OR
       shareit LIKE '%dailymotion%' OR
       shareit LIKE '%vimeo%'
      )
ORDER BY id DESC;

The use of wildcards at the beginning of the LIKE patterns means that an index cannot be used for that part of the comparison. The query can make use an index on shares(from).

Which is faster, the like or the regexp? In all honesty, most of the work of the query is fetching the rows. The comparisons are probably comparable. One might be a bit faster than the other, but that is a marginal consideration.

If you want exact matches, then the query would be:

SELECT shareit, id
FROM shares 
WHERE from = '".$nick."' and
      shareit IN ('youtube', 'dailymotion', 'vimeo')
ORDER BY id DESC;

For this query, an index on shares(from, shareit) is definitely better than the REGEXP version.

Upvotes: 2

Related Questions