Reputation: 54333
I've got the following table in a MySQL 5.6 with an InnoDB full text index.
CREATE TABLE `blacklist_entries` (
`blacklist_entry_id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`insertat` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`blacklist_entry_id`),
FULLTEXT KEY `ftk_b_n` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=660004 DEFAULT CHARSET=utf8
This is the data we are dealing with. All the entries in name
are single words, sometimes with -
, but never more than one word.
mysql> select * from blacklist_entries where name like '%battle%';
+--------------------+---------------------+---------------------+
| blacklist_entry_id | name | insertat |
+--------------------+---------------------+---------------------+
| 4159 | battleground | 2018-01-16 12:15:46 |
| 604218 | battle | 2018-01-16 12:18:59 |
| 604219 | battlefield | 2018-01-16 12:18:59 |
| 604220 | battlefields | 2018-01-16 12:18:59 |
| 604221 | battles | 2018-01-16 12:18:59 |
| 660003 | abcbattle | 2018-01-16 12:49:34 |
+--------------------+---------------------+---------------------+
I would like to create a full text search this list that is able to sort by relevance, where a full match has a higher score than a partial match.
When I run this query
select
*,match(name) against ('battle battle* *battle' IN BOOLEAN MODE) as score
from blacklist_entries where match(name)
against ('battle battle* *battle' IN BOOLEAN MODE);
I get the following result.
+--------+---------------------+---------------------+--------------------+
| id | name | insertat | score |
+--------+---------------------+---------------------+--------------------+
| 4159 | battleground | 2018-01-16 12:15:46 | 17.11724281311035 |
| 604218 | battle | 2018-01-16 12:18:59 | 17.11724281311035 |
| 604219 | battlefield | 2018-01-16 12:18:59 | 17.11724281311035 |
| 604220 | battlefields | 2018-01-16 12:18:59 | 17.11724281311035 |
| 604221 | battles | 2018-01-16 12:18:59 | 17.11724281311035 |
+--------+---------------------+---------------------+--------------------+
There are two issues with this.
It does not match from the right, and this line is missing
| 660003 | abcbattle | 2018-01-16 12:49:34 | 17.11724281311035 |
Can I change something in the query to convince MySQL to do score these matches differently?
If that is not possible, are there viable alternatives, like implementing my own scoring in a stored procedure, and iterate the full list again to at least identify the exact match to rank it higher?
If none of these work, I would have to build the scoring in my application layer completely. Obviously that is going to cost additional resources, so I would like to avoid that.
Upvotes: 1
Views: 347
Reputation: 142296
MATCH(name) AGAINST('battle*' IN BOOLEAN MODE)
+ 0.0001 * (name = 'battle') AS score
That should match all the words in your example, then boost the exact match a little bit.
For boosting when the word is found in the middle of a text field:
+ 0.0001 * (REGEXP '[[:<:]]battle[[:>:]]')
(There is nothing special about 0.0001
; it is big enough to change the score, but not so big as to change the "result".)
Upvotes: 1
Reputation: 2468
https://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html
The score is the amount of time the word appears, so for all of them it appears once and it splits the result between.
Upvotes: 1