tradyblix
tradyblix

Reputation: 7579

MySQL Fulltext search but using LIKE

I'm recently doing some string searches from a table with about 50k strings in it, fairly large I'd say but not that big. I was doing some nested queries for a 'search within results' kinda thing. I was using LIKE statement to get a match of a searched keyword.

I came across MySQL's Full-Text search which I tried so I added a fulltext index to my str column. I'm aware that Full-text searches doesn't work on virtually created tables or even with Views so queries with sub-selects will not fit. I mentioned I was doing a nested queries, example is:

SELECT s2.id, s2.str 
FROM 
    (
    SELECT s1.id, s1.str 
    FROM 
        (
         SELECT id, str 
         FROM strings 
         WHERE str LIKE '%term%'
         ) AS s1 
    WHERE s1.str LIKE '%another_term%'
    ) AS s2 
WHERE s2.str LIKE '%a_much_deeper_term%';

This is actually not applied to any code yet, I was just doing some tests. Also, searching strings like this can be easily achieved by using Sphinx (performance wise) but let's consider Sphinx not being available and I want to know how this will work well in pure SQL query. Running this query on a table without Full-text added takes about 2.97 secs. (depends on the search term). However, running this query on a table with Full-text added to the str column finished in like 104ms which is fast (i think?).

My question is simple, is it valid to use LIKE or is it a good practice to use it at all in a table with Full-text added when normally we would use MATCH and AGAINST statements?

Thanks!

Upvotes: 2

Views: 923

Answers (2)

Martin Zvarík
Martin Zvarík

Reputation: 2479

FULLTEXT seems useless for most of the usages.

And using LIKE '%whatever%' is extremely slow if you have 1+ mill rows and are using InnoDB, because it doesn't use any INDEX. MyIsam is far quicker, but if you have many writes the TABLE LOCK will make it slow.

Upvotes: 0

vbence
vbence

Reputation: 20333

In this case you not neccessarily need subselects. You can siply use:

SELECT id, str
FROM item_strings
WHERE str LIKE '%term%'
   AND str LIKE '%another_term%'
   AND str LIKE '%a_much_deeper_term%'

... but also raises a good question: the order in which you are excluding the rows. I guess MySQL is smart enough to assume that the longest term will be the most restrictive, so starting with a_much_deeper_term it will eliminate most of the records then perform addtitional comparsion only on a few rows. - Contrary to this, if you start with term you will probably end up with many possible records then you have to compare them against the st of the terms.

The interesting part is that you can force the order in which the comparsion is made by using your original subselect example. This gives the opportunity to make a decision which term is the most restrictive based upon more han just the length, but for example:

  • the ratio of consonants a vowels
  • the longest chain of consonants of the word
  • the most used vowel in the word

...etc. You can also apply some heuristics based on the type of textual infomation you are handling.

Edit:

This is just a hunch but it could be possible to apply the LIKE to the words in the fulltext indexitself. Then match the rows against the index as if you have serched for full words.

I'm not sure if this is actually done, but it would be a smart thing to pull off by the MySQL people. Also note that this theory can only be used if all possible ocurrences arein fact in the fulltext search. For this you need that:

  • Your search pattern must be at least the size of the miimal word-length. (If you re searching for example %id% then it can be a part of a 3 letter word too, which is excluded by default form FULLTEXT index).
  • Your search pattern must not be a substring of any listed excluded word for example: and, of etc.
  • Your pattern must not contain any special characters.

Upvotes: 1

Related Questions