Uwe
Uwe

Reputation: 295

mysql - speedup regex

I have a table:

+--------+------------------+------+-----+---------+----------------+
| Field  | Type             | Null | Key | Default | Extra          |
+--------+------------------+------+-----+---------+----------------+
| idurl  | int(11)          | NO   | PRI | NULL    | auto_increment |
| idsite | int(10) unsigned | NO   | MUL | NULL    |                |
| url    | varchar(2048)    | NO   |     | NULL    |                |
+--------+------------------+------+-----+---------+----------------+

the select statement is:

SELECT idurl,
       url
  FROM URL
 WHERE idsite = 34
   AND url REGEXP '^https\\://www\\.domain\\.com/checkout/step_one\\.php.*'

The query needs 5 seconds on a table with 1000000 rows. Can I achieve a speedup with indexes or something else?

Upvotes: 3

Views: 502

Answers (4)

a'r
a'r

Reputation: 37019

You could use the LIKE operator instead of a regular expression. But as your regular expression is simple, this may or may not improve performance.

You could split out the domain into a separate field, index it and use that in your where clause. If the URLs that you store are from many different domains then such an index could improve performance considerably.

Upvotes: 0

Timothy
Timothy

Reputation: 2477

Looks like you don't really need that REGEXP.

This clause should suffice:

AND eu.url LIKE 'https://www.domain.com/checkout/step_one.php%'

Upvotes: 0

Andomar
Andomar

Reputation: 238246

Looks like a LIKE might suffice. LIKE uses % as a wildcard for any number of characters.

AND url LIKE 'https://www.domain.com/checkout/step_one.php%'

LIKE does not require a starting anchor like ^. Only the second example would match:

'Sherlock and Watson' LIKE 'and%'
'Sherlock and Watson' LIKE '%and%'
'Sherlock and Watson' LIKE '%and'

Upvotes: 3

shoover
shoover

Reputation: 3140

Any index involving the URL column is likely not going to help you because the database engine still has to walk through the contents of that column to check whether the contents match the regex.

What may help you, depending on how many unique values of IDSITE you have, is to either place an index on IDSITE or do an initial select WHERE IDSITE = 34, and use that subquery as the target of your query on URL.

Something like:

select
    idurl,
    url
from
    (select idurl, url from uwe_url where idsite = 34)
where
    url REGEXP '^https\\://www\\.domain\\.com/checkout/step_one\\.php.*'

But I'm pretty sure you can't get around the text parsing for the URL column match.

Upvotes: 2

Related Questions