Lothric
Lothric

Reputation: 1318

SphinxSearch ranking

I have 5 fields across 3 tables in my database and I am trying to use SphinxSearch to make a search in those fields. I am trying to achieve this - for example I have three words to make a search word1, word2 and word3. And I want to get the results even if one word is in one field, second word is in another field etc. If I make a search like this:

SELECT * FROM pc_index WHERE MATCH('word1 word2 word3')

it works well and I get right results. But if I add word4 which is not presented in the previous result, it ruins everything and SphinxSearch returns nothing.

Another approach is to use OR operator, so query like this:

SELECT * FROM pc_index WHERE MATCH('word1 | word2 | word3 | word4')

This way I can find something even if there's only one word presented. But I can't find a way to make SphinxSearch sort the results by amount of the query words presented in the results. Another words - I want to get the results the way when first results contains the most amount of searching words, for example word1, word2 and word3, no matter how many times those words appear in the results.

I have tried different rankers, like SPH_RANK_WORDCOUNT, but they don't make any difference. And I can't use php to get the relevance since I use stemming and word from the query can be in the different form in the result.

Is there any way to achieve that performance?

Upvotes: 0

Views: 38

Answers (1)

Lothric
Lothric

Reputation: 1318

Well, if someone would be interested in the same question - the only solution I came up with is this. This is legacy project so here is pure php. The idea is to split the search query to single words, make separate searchings of every word and sort all of the results afterwards using php:

if ($search) {
    require_once 'SphinxClient.php';
    $client = new SphinxClient();
    $client->setServer("sphinxsearch", 9312);
    $client->setLimits(0, 1000);

    /* split the incoming string */
    $words = explode(' ', $search);
    $relevance = [];

    /* make some limits to prevent large amount of queries */
    $limit = 20;

    foreach ($words as $word) {
        if ($word && $limit > 0) {
            /* 
                search for the word itself and also use the wildcard if it's 
                only part of the word 

                enable min_infix_len in the index configuration to make that work
            */
            $result = $client->query($word . ' | *' . $word . '*', 'pc_index');
        
            if ($result !== false) {
                $matches = $result['matches'];

                if ($matches) {
                    /* fill the $relevance array, where the keys are the id and the values are the amounts of incoming words in this exact result */
                    foreach (array_keys($matches) as $id) {
                        if (isset($relevance[$id])) {
                            $relevance[$id]++;
                        } else {
                            $relevance[$id] = 1;
                        }
                    }   
                }
            }

            $limit--;
        }
    }

    /* sort the $relevance by values and get the needed ids */
    arsort($relevance);

    $ids = [0];

    if ($relevance) {
        $ids = array_keys($relevance);
    }

    /* and make query like this with ordering by our relevance */
    $sqlQuery = "
        SELECT * 
        FROM products
        WHERE products.id IN (" . implode(', ', $ids) . ")
        ORDER BY FIELD(products.id, ' . implode(', ', $ids) . ')
    ";
}

This could be easier if you don't use stemming, this way you don't need to make separate queries for each word.

Anyway, I don't like this answer, but this is the only way I found to make it work as I wanted. If someone tells me the better way it would be awesome.

Upvotes: 0

Related Questions