Reputation: 49813
i'm trying fulltext search with tags but it doesn't works properly for me chek attached image please
The query is:
SELECT *,
MATCH(tags) AGAINST ('tag3 tag6 tag4') AS score
FROM items
ORDER BY score DESC
why does the score is not ordering in the right order fields? if you check the second row has all the tags i searched while the first field does not have tag3 keyword .
i mean id fields order should be : 5,1,2 .. etc and NOT 1,5,2..etc
where is my mistake?
then i would like to search first in tags field then if no results i would like to search the same keyword as FULLTEXT inside description field, so users will search both in tags and description if tags does not match , is it possible in the same query or i need two separated queries?
Upvotes: 3
Views: 1405
Reputation: 44343
First of all, here is your sample data loaded into MySQL 5.5.12 on my Windows7 Machine
mysql> DROP DATABASE IF EXISTS lspuk;
Query OK, 1 row affected (0.00 sec)
mysql> CREATE DATABASE lspuk;
Query OK, 1 row affected (0.00 sec)
mysql> USE lspuk
Database changed
mysql> CREATE TABLE items
-> (
-> id int not null auto_increment,
-> description VARCHAR(30),
-> tags VARCHAR(30),
-> primary key (id),
-> FULLTEXT tags_ftndx (tags)
-> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.04 sec)
mysql> INSERT INTO items (description,tags) VALUES
-> ('the first' ,'tag1 tag3 tag4'),
-> ('the second','tag5 tag1 tag2'),
-> ('the third' ,'tag5 tag1 tag9'),
-> ('the fourth','tag5 tag6 tag2'),
-> ('the fifth' ,'tag4 tag3 tag6'),
-> ('the sixth' ,'tag2 tag3 tag6');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql>
Please look at the way the tag population is happening in MySQL:
mysql> SELECT 'tag1',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag1%' UNION
-> SELECT 'tag2',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag2%' UNION
-> SELECT 'tag3',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag3%' UNION
-> SELECT 'tag4',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag4%' UNION
-> SELECT 'tag5',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag5%' UNION
-> SELECT 'tag6',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag6%' UNION
-> SELECT 'tag9',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag9%';
+------+-----------+
| tag1 | tag_count |
+------+-----------+
| tag1 | 3 |
| tag2 | 3 |
| tag3 | 3 |
| tag4 | 2 |
| tag5 | 3 |
| tag6 | 3 |
| tag9 | 1 |
+------+-----------+
7 rows in set (0.00 sec)
mysql>
Look carefully and please note the following facts:
If you remove tag4 and run the query, you get no score at all
mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6') as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags | score |
+----+-------------+----------------+-------+
| 1 | the first | tag1 tag3 tag4 | 0 |
| 2 | the second | tag5 tag1 tag2 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 |
| 4 | the fourth | tag5 tag6 tag2 | 0 |
| 5 | the fifth | tag4 tag3 tag6 | 0 |
| 6 | the sixth | tag2 tag3 tag6 | 0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)
The evaluation method seems to be based on the average number of tokens field and presence and/or absence of specific values in a particular order affects scoring. If you apply different styles of scoring and tag specification, note the various scores:
mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4') as score FROM items ORDER BY score DESC;
+----+-------------+----------------+--------------------+
| id | description | tags | score |
+----+-------------+----------------+--------------------+
| 1 | the first | tag1 tag3 tag4 | 0.6700310707092285 |
| 5 | the fifth | tag4 tag3 tag6 | 0.6700310707092285 |
| 2 | the second | tag5 tag1 tag2 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 |
| 4 | the fourth | tag5 tag6 tag2 | 0 |
| 6 | the sixth | tag2 tag3 tag6 | 0 |
+----+-------------+----------------+--------------------+
6 rows in set (0.00 sec)
mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags | score |
+----+-------------+----------------+-------+
| 5 | the fifth | tag4 tag3 tag6 | 3 |
| 1 | the first | tag1 tag3 tag4 | 2 |
| 6 | the sixth | tag2 tag3 tag6 | 2 |
| 4 | the fourth | tag5 tag6 tag2 | 1 |
| 2 | the second | tag5 tag1 tag2 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)
mysql> SELECT *,MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags | score |
+----+-------------+----------------+-------+
| 5 | the fifth | tag4 tag3 tag6 | 1 |
| 1 | the first | tag1 tag3 tag4 | 0 |
| 2 | the second | tag5 tag1 tag2 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 |
| 4 | the fourth | tag5 tag6 tag2 | 0 |
| 6 | the sixth | tag2 tag3 tag6 | 0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)
mysql>
The solution would seem to be evaluate a BOOLEAN MODE score, and then a non-BOOLEAN MODE score as follows:
SELECT *,
MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2
FROM items ORDER BY score2 DESC, score1 DESC;
Here is the result against your sample data:
mysql> SELECT *,
-> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
-> MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2
-> FROM items ORDER BY score2 DESC, score1 DESC;
+----+-------------+----------------+--------------------+--------+
| id | description | tags | score1 | score2 |
+----+-------------+----------------+--------------------+--------+
| 5 | the fifth | tag4 tag3 tag6 | 0.6700310707092285 | 1 |
| 1 | the first | tag1 tag3 tag4 | 0.6700310707092285 | 0 |
| 2 | the second | tag5 tag1 tag2 | 0 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 | 0 |
| 4 | the fourth | tag5 tag6 tag2 | 0 | 0 |
| 6 | the sixth | tag2 tag3 tag6 | 0 | 0 |
+----+-------------+----------------+--------------------+--------+
6 rows in set (0.00 sec)
mysql>
or you could try not using the plus signs
mysql> SELECT *,
-> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
-> MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score2
-> FROM items ORDER BY score2 DESC, score1 DESC;
+----+-------------+----------------+--------------------+--------+
| id | description | tags | score1 | score2 |
+----+-------------+----------------+--------------------+--------+
| 5 | the fifth | tag4 tag3 tag6 | 0.6700310707092285 | 3 |
| 1 | the first | tag1 tag3 tag4 | 0.6700310707092285 | 2 |
| 6 | the sixth | tag2 tag3 tag6 | 0 | 2 |
| 4 | the fourth | tag5 tag6 tag2 | 0 | 1 |
| 2 | the second | tag5 tag1 tag2 | 0 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 | 0 |
+----+-------------+----------------+--------------------+--------+
6 rows in set (0.00 sec)
mysql>
Either way, you will have to incorporate BOOLEAN MODE and non-BOOLEAN mode at the same time.
Upvotes: 1
Reputation: 108676
In this document http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html it says "For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results."
If your items table is small -- a sample table, for example -- you are probably hitting this problem and getting a "bizarre" result.
You may wish to try this query IN BOOLEAN MODE
to see if your results match your prediction. Try this.
SELECT *,
MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) AS score
FROM items
ORDER BY score DESC
The Boolean mode disables the word-distribution ranking. Notice that you should understand the difference between natural-language and Boolean modes, and once you have a decently sized table, make a smart choice about which one to use. If you're searching for the kind of tags that blogs have in them, Boolean may be the way to go.
Upvotes: 2
Reputation: 4221
Modify the order to order by score DESC, id DESC.
Assuming the score values are the same, the row with 5 will show up first.
Upvotes: 0