Maurice
Maurice

Reputation: 1092

MySQL fulltext search with word boundaries

I've read some articles and issues, but couldn't find a satisfying solution. I want to select related records from the database when a user fills in a form; on the same way the functionality works on this site when you ask a question.

Consider a database table with the following three records in the column subject

+---+---------------------------------------------------+
| 1 | Pagina aanmaken en beter doorzoekbaar maken       |
+---+---------------------------------------------------+
| 2 | Sorteerfunctie uitbreiden in zoek-en-boek functie |
+---+---------------------------------------------------+
| 3 | Zoek de verschillen tussen de pagina's            |
+---+---------------------------------------------------+

I start my search query with the word zoek so i want to query the most relevant results from the database on the term zoek. I came up with the following query:

SELECT 
    id, 
    subject, 
    MATCH(
        subject
    ) 
    AGAINST(
        'zoek*'
        IN BOOLEAN MODE
    ) 
    AS 
        score
FROM 
    Issues 
WHERE 
    MATCH(
        subject
    ) 
    AGAINST(
        'zoek*'
        IN BOOLEAN MODE
    )

When i run this query i'd expected all the records to show and (probably, i don't know how specificity works in MySQL) ID 3 to display on top (because exact word match).

Instead the results of the query were only row 2 and 3 with exactly the same score (0.031008131802082062).

What do i need to change in my query to match appropriate records? Also considering that users can type in keywords or sentences.

Upvotes: 9

Views: 3078

Answers (5)

Cédric Miachon
Cédric Miachon

Reputation: 344

There is a workaound for your case:

SELECT 
    id, 
    subject, 
    IF (subject LIKE "zoek %" OR subject LIKE "% zoek %" OR subject LIKE "% zoek", 
        1, 
        IF (subject LIKE "% zoek%",
            0.5,
            IF (subject LIKE "%zoek%",
                0.2,
                0)
            )
        ) as score
FROM 
    Issues 
WHERE subject LIKE "%zoek%"
ORDER by score DESC

Expected result:

+---+---------------------------------------------------+------+
|id |   subject                                         |score |    
+---+---------------------------------------------------+------+
|3  | Zoek de verschillen tussen de pagina's            | 1    |
+---+---------------------------------------------------+------+
|2  | Sorteerfunctie uitbreiden in zoek-en-boek functie | 0.5  |
+---+---------------------------------------------------+------+
|1  | Pagina aanmaken en beter doorzoekbaar maken       | 0.2  |
+---+---------------------------------------------------+------+

Upvotes: 0

Mary Grace Bautista
Mary Grace Bautista

Reputation: 11

Try this queries for different results:

  1. Select all subject that starts with letter "z":
    SELECT ID, Subject FROM table_name WHERE Subject LIKE 'z%';

  2. Select all subject that ends with letter "z":
    SELECT ID, Subject FROM table_name WHERE Subject LIKE '%z';

  3. Select all subject containing the pattern "zoek":
    SELECT ID, Subject FROM table_name WHERE Subject LIKE '%zoek%';

Upvotes: 1

RandomSeed
RandomSeed

Reputation: 29749

As others advised, MySQL's FULLTEXT indexes do not support leading wildcards, and therefore cannot help in searching for suffixes.

However, the new ngram Full-Text Parser might help :

The built-in MySQL full-text parser uses the white space between words as a delimiter to determine where words begin and end, which is a limitation when working with ideographic languages that do not use word delimiters. To address this limitation, MySQL provides an ngram full-text parser (...).

An ngram is a contiguous sequence of n characters from a given sequence of text. The ngram parser tokenizes a sequence of text into a contiguous sequence of n characters.

As I have never used this feature, I cannot help further on this topic. Notice however:

Because an ngram FULLTEXT index contains only ngrams, and does not contain information about the beginning of terms, wildcard searches may return unexpected results.

Upvotes: 1

Rick James
Rick James

Reputation: 142208

Sorry...

Middle of word (doorzoekbaar) is, by definition of MySQL's FULLTEXT, not something that will be found. FULLTEXT has no concept of "compound nouns", so it won't attempt to pick the word apart.

The definition of a "word" in FULLTEXT give 'dash' and 'space' the same meaning -- namely a word boundary. So, zoek de... and zoek-... are given equal weight.

Look at Solr, Lucene, and other 3rd party "fulltext solutions". They may (or may not) provide what you want.

zoek* and +zoek*, when run with IN BOOLEAN MODE will find zoekbaar.

Upvotes: 0

R.Costa
R.Costa

Reputation: 1393

MySQL full-text search doesn't support suffixes.

To get the first row you would have to do a match against '*zoek*' which is currently not allowed.

The alternative is to use

SELECT id, subject
FROM Issues 
WHERE subject LIKE '%zoek%' 

Upvotes: 1

Related Questions