Nicolas Payart
Nicolas Payart

Reputation: 1086

SphinxSearch Ranker=matchany on multiple fields

Using Sphinx 2.1.4-id64-dev (rel21-r4324)

I want to search over multiple fields but do not want "duplicate words" to increase weight.

So, I am using ranker=matchany option.

this works as I want when duplicates are in a single field:

MySQL [(none)]> select id, val, val2, weight() FROM nptest WHERE match('@(val,val2) bar') OPTION ranker=matchany;
+------+---------+------+----------+
| id   | val     | val2 | weight() |
+------+---------+------+----------+
|    3 | bar     |      |        1 |
|    4 | bar bar |      |        1 |
+------+---------+------+----------+
2 rows in set (0.00 sec)

=> weights are equal, despite the duplicate word in doc 4.

But that do not work anymore when duplicates are over multiple fields:

MySQL [(none)]> select id, val, val2, weight() FROM nptest WHERE match('@(val,val2) foo') OPTION ranker=matchany;
+------+------+------+----------+
| id   | val  | val2 | weight() |
+------+------+------+----------+
|    2 | foo  | foo  |        2 |
|    1 | foo  |      |        1 |
+------+------+------+----------+
2 rows in set (0.00 sec)

weight of id-2 > weight of id-1

Is there a way to apply a "matchany" ranking mode on multiple fields?

Here is a sample sphinx.conf file :

source nptest
{
        type                    = mysql
        sql_host                = localhost
        sql_user                = myuser
        sql_pass                = mypass
        sql_db                  = test
        sql_port                = 3306

        sql_query               = \
                SELECT 1, 'foo' AS val, '' AS val2 \
                UNION \
                SELECT 2, 'foo', 'foo' \
                UNION \
                SELECT 3, 'bar', '' \
                UNION \
                SELECT 4, 'bar bar', ''

        sql_field_string = val
        sql_field_string = val2
}

index nptest
{
        type                    = plain
        source                  = nptest
        path                    = /var/lib/sphinxsearch/data/nptest
        morphology              = none
}

Upvotes: 0

Views: 1668

Answers (2)

Nicolas Payart
Nicolas Payart

Reputation: 1086

After upgrading to Sphinx 2.2.1-id64-beta (r4330) I was able to use top() aggregate function in a "custom expression ranker" like this :

MySQL [(none)]> SELECT id, val, val2, weight() FROM nptest WHERE match('@(val,val2) foo') OPTION ranker=expr('top((word_count+(lcs-1)*max_lcs)*user_weight)'), field_weights=(val=3,val2=4);
+------+-------------+------+----------+
| id   | val         | val2 | weight() |
+------+-------------+------+----------+
|    2 | foo         | foo  |        4 |
|    1 | foo         |      |        3 |
|    5 | bar bar foo | bar  |        3 |
+------+-------------+------+----------+
3 rows in set (0.00 sec)

That way, multiple occurrences accross multiple fields do not increase global weight and if fields have different weights, top weighted field is taken.

Many Thanks to barryhunter for his great help!

Upvotes: 1

barryhunter
barryhunter

Reputation: 21091

You need the expression ranker http://sphinxsearch.com/docs/current.html#weighting

can start with the default expression for the matchany and tweak it.

Using doc_word_count instead of sum(word_count) should be useful.

Upvotes: 2

Related Questions