Reputation: 1086
Using Sphinx 2.1.4-id64-dev (rel21-r4324)
I want to search over multiple fields but do not want "duplicate words" to increase weight.
So, I am using ranker=matchany option.
this works as I want when duplicates are in a single field:
MySQL [(none)]> select id, val, val2, weight() FROM nptest WHERE match('@(val,val2) bar') OPTION ranker=matchany;
+------+---------+------+----------+
| id | val | val2 | weight() |
+------+---------+------+----------+
| 3 | bar | | 1 |
| 4 | bar bar | | 1 |
+------+---------+------+----------+
2 rows in set (0.00 sec)
=> weights are equal, despite the duplicate word in doc 4.
But that do not work anymore when duplicates are over multiple fields:
MySQL [(none)]> select id, val, val2, weight() FROM nptest WHERE match('@(val,val2) foo') OPTION ranker=matchany;
+------+------+------+----------+
| id | val | val2 | weight() |
+------+------+------+----------+
| 2 | foo | foo | 2 |
| 1 | foo | | 1 |
+------+------+------+----------+
2 rows in set (0.00 sec)
weight of id-2 > weight of id-1
Is there a way to apply a "matchany" ranking mode on multiple fields?
Here is a sample sphinx.conf file :
source nptest
{
type = mysql
sql_host = localhost
sql_user = myuser
sql_pass = mypass
sql_db = test
sql_port = 3306
sql_query = \
SELECT 1, 'foo' AS val, '' AS val2 \
UNION \
SELECT 2, 'foo', 'foo' \
UNION \
SELECT 3, 'bar', '' \
UNION \
SELECT 4, 'bar bar', ''
sql_field_string = val
sql_field_string = val2
}
index nptest
{
type = plain
source = nptest
path = /var/lib/sphinxsearch/data/nptest
morphology = none
}
Upvotes: 0
Views: 1668
Reputation: 1086
After upgrading to Sphinx 2.2.1-id64-beta (r4330) I was able to use top() aggregate function in a "custom expression ranker" like this :
MySQL [(none)]> SELECT id, val, val2, weight() FROM nptest WHERE match('@(val,val2) foo') OPTION ranker=expr('top((word_count+(lcs-1)*max_lcs)*user_weight)'), field_weights=(val=3,val2=4);
+------+-------------+------+----------+
| id | val | val2 | weight() |
+------+-------------+------+----------+
| 2 | foo | foo | 4 |
| 1 | foo | | 3 |
| 5 | bar bar foo | bar | 3 |
+------+-------------+------+----------+
3 rows in set (0.00 sec)
That way, multiple occurrences accross multiple fields do not increase global weight and if fields have different weights, top weighted field is taken.
Many Thanks to barryhunter for his great help!
Upvotes: 1
Reputation: 21091
You need the expression ranker http://sphinxsearch.com/docs/current.html#weighting
can start with the default expression for the matchany and tweak it.
Using doc_word_count
instead of sum(word_count)
should be useful.
Upvotes: 2