Filippo oretti
Filippo oretti

Reputation: 49813

MYsql FULLTEXT query yields unexpected ranking; why?

i'm trying fulltext search with tags but it doesn't works properly for me chek attached image pleaseenter image description here

The query is:

 SELECT *, 
         MATCH(tags) AGAINST ('tag3 tag6 tag4') AS score 
    FROM items
ORDER BY score DESC

why does the score is not ordering in the right order fields? if you check the second row has all the tags i searched while the first field does not have tag3 keyword .

i mean id fields order should be : 5,1,2 .. etc and NOT 1,5,2..etc

where is my mistake?

then i would like to search first in tags field then if no results i would like to search the same keyword as FULLTEXT inside description field, so users will search both in tags and description if tags does not match , is it possible in the same query or i need two separated queries?

Upvotes: 3

Views: 1405

Answers (3)

RolandoMySQLDBA
RolandoMySQLDBA

Reputation: 44343

First of all, here is your sample data loaded into MySQL 5.5.12 on my Windows7 Machine

mysql> DROP DATABASE IF EXISTS lspuk;
Query OK, 1 row affected (0.00 sec)

mysql> CREATE DATABASE lspuk;
Query OK, 1 row affected (0.00 sec)

mysql> USE lspuk
Database changed
mysql> CREATE TABLE items
    -> (
    ->     id int not null auto_increment,
    ->     description VARCHAR(30),
    ->     tags VARCHAR(30),
    ->     primary key (id),
    ->     FULLTEXT tags_ftndx (tags)
    -> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO items (description,tags) VALUES
    -> ('the first' ,'tag1 tag3 tag4'),
    -> ('the second','tag5 tag1 tag2'),
    -> ('the third' ,'tag5 tag1 tag9'),
    -> ('the fourth','tag5 tag6 tag2'),
    -> ('the fifth' ,'tag4 tag3 tag6'),
    -> ('the sixth' ,'tag2 tag3 tag6');
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql>

Please look at the way the tag population is happening in MySQL:

mysql> SELECT 'tag1',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag1%' UNION
    -> SELECT 'tag2',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag2%' UNION
    -> SELECT 'tag3',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag3%' UNION
    -> SELECT 'tag4',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag4%' UNION
    -> SELECT 'tag5',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag5%' UNION
    -> SELECT 'tag6',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag6%' UNION
    -> SELECT 'tag9',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag9%';
+------+-----------+
| tag1 | tag_count |
+------+-----------+
| tag1 |         3 |
| tag2 |         3 |
| tag3 |         3 |
| tag4 |         2 |
| tag5 |         3 |
| tag6 |         3 |
| tag9 |         1 |
+------+-----------+
7 rows in set (0.00 sec)

mysql>

Look carefully and please note the following facts:

  1. Each row has exactly 3 tags
  2. The order the tags are requested vs how many of each tag exists seems to govern the score

If you remove tag4 and run the query, you get no score at all

mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6') as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags           | score |
+----+-------------+----------------+-------+
|  1 | the first   | tag1 tag3 tag4 |     0 |
|  2 | the second  | tag5 tag1 tag2 |     0 |
|  3 | the third   | tag5 tag1 tag9 |     0 |
|  4 | the fourth  | tag5 tag6 tag2 |     0 |
|  5 | the fifth   | tag4 tag3 tag6 |     0 |
|  6 | the sixth   | tag2 tag3 tag6 |     0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)

The evaluation method seems to be based on the average number of tokens field and presence and/or absence of specific values in a particular order affects scoring. If you apply different styles of scoring and tag specification, note the various scores:

mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4') as score FROM items ORDER BY score DESC;
+----+-------------+----------------+--------------------+
| id | description | tags           | score              |
+----+-------------+----------------+--------------------+
|  1 | the first   | tag1 tag3 tag4 | 0.6700310707092285 |
|  5 | the fifth   | tag4 tag3 tag6 | 0.6700310707092285 |
|  2 | the second  | tag5 tag1 tag2 |                  0 |
|  3 | the third   | tag5 tag1 tag9 |                  0 |
|  4 | the fourth  | tag5 tag6 tag2 |                  0 |
|  6 | the sixth   | tag2 tag3 tag6 |                  0 |
+----+-------------+----------------+--------------------+
6 rows in set (0.00 sec)

mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags           | score |
+----+-------------+----------------+-------+
|  5 | the fifth   | tag4 tag3 tag6 |     3 |
|  1 | the first   | tag1 tag3 tag4 |     2 |
|  6 | the sixth   | tag2 tag3 tag6 |     2 |
|  4 | the fourth  | tag5 tag6 tag2 |     1 |
|  2 | the second  | tag5 tag1 tag2 |     0 |
|  3 | the third   | tag5 tag1 tag9 |     0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)

mysql> SELECT *,MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags           | score |
+----+-------------+----------------+-------+
|  5 | the fifth   | tag4 tag3 tag6 |     1 |
|  1 | the first   | tag1 tag3 tag4 |     0 |
|  2 | the second  | tag5 tag1 tag2 |     0 |
|  3 | the third   | tag5 tag1 tag9 |     0 |
|  4 | the fourth  | tag5 tag6 tag2 |     0 |
|  6 | the sixth   | tag2 tag3 tag6 |     0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)

mysql>

The solution would seem to be evaluate a BOOLEAN MODE score, and then a non-BOOLEAN MODE score as follows:

SELECT *,
MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2
FROM items ORDER BY score2 DESC, score1 DESC;

Here is the result against your sample data:

mysql> SELECT *,
    -> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
    -> MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2
    -> FROM items ORDER BY score2 DESC, score1 DESC;
+----+-------------+----------------+--------------------+--------+
| id | description | tags           | score1             | score2 |
+----+-------------+----------------+--------------------+--------+
|  5 | the fifth   | tag4 tag3 tag6 | 0.6700310707092285 |      1 |
|  1 | the first   | tag1 tag3 tag4 | 0.6700310707092285 |      0 |
|  2 | the second  | tag5 tag1 tag2 |                  0 |      0 |
|  3 | the third   | tag5 tag1 tag9 |                  0 |      0 |
|  4 | the fourth  | tag5 tag6 tag2 |                  0 |      0 |
|  6 | the sixth   | tag2 tag3 tag6 |                  0 |      0 |
+----+-------------+----------------+--------------------+--------+
6 rows in set (0.00 sec)

mysql>

or you could try not using the plus signs

mysql> SELECT *,
    -> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
    -> MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score2
    -> FROM items ORDER BY score2 DESC, score1 DESC;
+----+-------------+----------------+--------------------+--------+
| id | description | tags           | score1             | score2 |
+----+-------------+----------------+--------------------+--------+
|  5 | the fifth   | tag4 tag3 tag6 | 0.6700310707092285 |      3 |
|  1 | the first   | tag1 tag3 tag4 | 0.6700310707092285 |      2 |
|  6 | the sixth   | tag2 tag3 tag6 |                  0 |      2 |
|  4 | the fourth  | tag5 tag6 tag2 |                  0 |      1 |
|  2 | the second  | tag5 tag1 tag2 |                  0 |      0 |
|  3 | the third   | tag5 tag1 tag9 |                  0 |      0 |
+----+-------------+----------------+--------------------+--------+
6 rows in set (0.00 sec)

mysql>

Either way, you will have to incorporate BOOLEAN MODE and non-BOOLEAN mode at the same time.

Upvotes: 1

O. Jones
O. Jones

Reputation: 108676

In this document http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html it says "For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results."

If your items table is small -- a sample table, for example -- you are probably hitting this problem and getting a "bizarre" result.

You may wish to try this query IN BOOLEAN MODE to see if your results match your prediction. Try this.

    SELECT *, 
           MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) AS score 
      FROM items
  ORDER BY score DESC

The Boolean mode disables the word-distribution ranking. Notice that you should understand the difference between natural-language and Boolean modes, and once you have a decently sized table, make a smart choice about which one to use. If you're searching for the kind of tags that blogs have in them, Boolean may be the way to go.

Upvotes: 2

Kneel-Before-ZOD
Kneel-Before-ZOD

Reputation: 4221

Modify the order to order by score DESC, id DESC.
Assuming the score values are the same, the row with 5 will show up first.

Upvotes: 0

Related Questions