simbabque
simbabque

Reputation: 54333

How do I get different scores for partially matching single-word searches on an InnoDB full text index?

I've got the following table in a MySQL 5.6 with an InnoDB full text index.

CREATE TABLE `blacklist_entries` (
  `blacklist_entry_id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) NOT NULL,
  `insertat` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`blacklist_entry_id`),
  FULLTEXT KEY `ftk_b_n` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=660004 DEFAULT CHARSET=utf8

This is the data we are dealing with. All the entries in name are single words, sometimes with -, but never more than one word.

mysql> select * from blacklist_entries where name like '%battle%';
+--------------------+---------------------+---------------------+
| blacklist_entry_id | name                | insertat            |
+--------------------+---------------------+---------------------+
|               4159 | battleground        | 2018-01-16 12:15:46 |
|             604218 | battle              | 2018-01-16 12:18:59 |
|             604219 | battlefield         | 2018-01-16 12:18:59 |
|             604220 | battlefields        | 2018-01-16 12:18:59 |
|             604221 | battles             | 2018-01-16 12:18:59 |
|             660003 | abcbattle           | 2018-01-16 12:49:34 |
+--------------------+---------------------+---------------------+

I would like to create a full text search this list that is able to sort by relevance, where a full match has a higher score than a partial match.

When I run this query

select
    *,match(name) against ('battle battle* *battle' IN BOOLEAN MODE) as score
from blacklist_entries where match(name)
    against ('battle battle* *battle' IN BOOLEAN MODE);

I get the following result.

+--------+---------------------+---------------------+--------------------+
|     id | name                | insertat            | score              |
+--------+---------------------+---------------------+--------------------+
|   4159 | battleground        | 2018-01-16 12:15:46 |  17.11724281311035 |
| 604218 | battle              | 2018-01-16 12:18:59 |  17.11724281311035 |
| 604219 | battlefield         | 2018-01-16 12:18:59 |  17.11724281311035 |
| 604220 | battlefields        | 2018-01-16 12:18:59 |  17.11724281311035 |
| 604221 | battles             | 2018-01-16 12:18:59 |  17.11724281311035 |
+--------+---------------------+---------------------+--------------------+

There are two issues with this.

Can I change something in the query to convince MySQL to do score these matches differently?

If that is not possible, are there viable alternatives, like implementing my own scoring in a stored procedure, and iterate the full list again to at least identify the exact match to rank it higher?

If none of these work, I would have to build the scoring in my application layer completely. Obviously that is going to cost additional resources, so I would like to avoid that.

Upvotes: 1

Views: 347

Answers (2)

Rick James
Rick James

Reputation: 142296

MATCH(name) AGAINST('battle*' IN BOOLEAN MODE)
    + 0.0001 * (name = 'battle')  AS score

That should match all the words in your example, then boost the exact match a little bit.

For boosting when the word is found in the middle of a text field:

   + 0.0001 * (REGEXP '[[:<:]]battle[[:>:]]')

(There is nothing special about 0.0001; it is big enough to change the score, but not so big as to change the "result".)

Upvotes: 1

farbiondriven
farbiondriven

Reputation: 2468

https://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html

The score is the amount of time the word appears, so for all of them it appears once and it splits the result between.

Upvotes: 1

Related Questions