Shiplu Mokaddim
Shiplu Mokaddim

Reputation: 57680

how to improve MySQL REGEXP search?

Question are,

1.How can I improve the performance of SELECT queries in mysql utilizing REGEXP?

The table looks like

create table `tweets`(
    `id` bigint auto_increment,
    `tweet` varchar(140),
    `time` datetime,
    primary key(`id`)
);

Here the following query takes about 0.35 seconds.

select tweet from tweets where tweet regexp '^[abcdef]{1,4}$';
  1. Will indexing tweet make it faster? If so, what type of index should I use?
  2. My table engine is InnoDB, Is there any other table engine that will become beneficial?

Upvotes: 0

Views: 1682

Answers (2)

redolent
redolent

Reputation: 4259

If the search you're looking for is at the start of a string, you can use LIKE as a high-level filter then check again with REGEXP:

select tweet from tweets 
where 
    ( 
      tweet LIKE 'a%' OR
      tweet LIKE 'b%' OR
      tweet LIKE 'c%' OR
      tweet LIKE 'd%' OR
      tweet LIKE 'e%'
    )
    AND LENGTH(tweet) <= 4 -- try taking this line out line too
    AND tweet regexp '^[abcdef]{1,4}$';

In spite of being a little convoluted, this should be a lot faster.

Upvotes: 1

user610217
user610217

Reputation:

Your best bet is to reduce the result set to evaluate against the regular expression before evaluating. Regular expressions are, for all intents and purposes, impossible to index for.

If I had to come up with a way for this, I would examine patterns that are commonly searched against, and mark them in some indexible way at insert time. For example if you use the ^[abcdef]{1,4}$ expression to search against a lot, I'd make a boolean column first4AThruF and on an insert/ update trigger, update the column to true or false based on whether or not it matched the regular expression. If I indexed the first4AThruF column, and the column had enough selectivity, I could write the query:

select tweet from tweets where first4AThruF = true;

and this should be pretty zippy.

Other possibilities to consider are full-text queries or LIKE clauses, although in the case mentioned above I don't expect them to work well.

Upvotes: 3

Related Questions