Reputation: 1725
We have a full text index set up to use for searches on a website (mysql/php).
The searches work great most of the time, but we keep running into these strange errors.
For example:
1) This works: "Chinese Wok"
2) This doesn't: "First Wok"
My assumption is that the 2nd doesn't work because: a) It kicks out 'wok' since it's only 3 letters b) It kicks out 'first' because it's in some list of words to ignore.
Are my assumptions correct?
If so, how would I go about tweaking things to both: a) Somehow whitelist 'first' as a word to use in the search b) Somehow whitelist 'wok' despite it being a 3 letter word only
Thanks as always!
Upvotes: 1
Views: 1530
Reputation: 126005
Are my assumptions correct?
You are correct on both counts. As documented under Natural Language Full-Text Searches:
Some words are ignored in full-text searches:
Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is four characters.
Words in the stopword list are ignored. A stopword is a word such as “the” or “some” that is so common that it is considered to have zero semantic value. There is a built-in stopword list, but it can be overwritten by a user-defined list.
The default stopword list is given in Section 12.9.4, “Full-Text Stopwords”. The default minimum word length and stopword list can be changed as described in Section 12.9.6, “Fine-Tuning MySQL Full-Text Search”.
As documented under Fine-Tuning MySQL Full-Text Search:
The minimum and maximum lengths of words to be indexed are defined by the
ft_min_word_len
andft_max_word_len
system variables. (See Section 5.1.4, “Server System Variables”.) The default minimum value is four characters; the default maximum is version dependent. If you change either value, you must rebuild yourFULLTEXT
indexes. For example, if you want three-character words to be searchable, you can set theft_min_word_len
variable by putting the following lines in an option file:[mysqld] ft_min_word_len=3Then restart the server and rebuild your
FULLTEXT
indexes. Note particularly the remarks regarding myisamchk in the instructions following this list.To override the default stopword list, set the
ft_stopword_file
system variable. (See Section 5.1.4, “Server System Variables”.) The variable value should be the path name of the file containing the stopword list, or the empty string to disable stopword filtering. The server looks for the file in the data directory unless an absolute path name is given to specify a different directory. After changing the value of this variable or the contents of the stopword file, restart the server and rebuild yourFULLTEXT
indexes.The stopword list is free-form. That is, you may use any nonalphanumeric character such as newline, space, or comma to separate stopwords. Exceptions are the underscore character (“
_
”) and a single apostrophe (“'
”) which are treated as part of a word. The character set of the stopword list is the server's default character set; see Section 10.1.3.1, “Server Character Set and Collation”.
Upvotes: 3