TheMan68
TheMan68

Reputation: 1469

mysql full text wildcard search like regex

Upto now I have been using

WHERE col REGEXP 'IN (.*) WE TRUST'

But ever since adding the full index search to this column then this query is very slow.

I'm waning to know know how I could implement a wildcard search using full text index searches.

These are two queries that I have been playing with but still Im getting lots of results that is unexpected and not sure at all why my query is pulling those results.

WHERE MATCH (markIdentification) AGAINST ('IN (.*) WE TRUST')
WHERE MATCH (markIdentification) AGAINST ('+IN (.*) +WE +TRUST')
WHERE MATCH (markIdentification) AGAINST ('+IN * +WE +TRUST')

These are the only ones that seem to get even close. Any suggestions?

Thank you

Update for question ref:

SELECT * from table
 WHERE MATCH (col) AGAINST ('+IN * +WE +TRUST')
   AND col LIKE '%IN (.*) WE TRUST%'

Is this correct? If not then how would you do it?

Upvotes: 3

Views: 4252

Answers (1)

O. Jones
O. Jones

Reputation: 108641

The FULLTEXT search engine ignores words shorter than three characters. You can set the innodb_ft_min_token_size option to change that, then regenerate your FULLTEXT indexes.

The + (and -) syntax in AGAINST are boolean search mode things. So to use + you need

WHERE MATCH (markIdentification) AGAINST ('+IN +WE +TRUST' IN BOOLEAN MODE )

BOOLEAN mode has lots of special characters to control searches, but * standing alone is not one of them. You can say 'TRUST*' to match trust, trustee, and trusted.

Taking Gordon's suggestion, you might try this:

WHERE MATCH (markIdentification) AGAINST ('+IN +WE +TRUST' IN BOOLEAN MODE )
  AND  markIdentification REGEXP 'IN (.*) WE TRUST'

This will use your FULLTEXT index to look for possible matches, and REGEXP to get more exact results. The expensive REGEXP operation, then, can run on many fewer rows.

(Beware IN NATURAL LANGUAGE MODE when your tables don't have many rows. It can give strange results. The indexer decides which words are too common to bother with, and if you have a small number of words, that decision gets distorted.)

Upvotes: 4

Related Questions