Reputation: 1318
I have 5 fields across 3 tables in my database and I am trying to use SphinxSearch to make a search in those fields. I am trying to achieve this - for example I have three words to make a search word1
, word2
and word3
. And I want to get the results even if one word is in one field, second word is in another field etc. If I make a search like this:
SELECT * FROM pc_index WHERE MATCH('word1 word2 word3')
it works well and I get right results. But if I add word4
which is not presented in the previous result, it ruins everything and SphinxSearch returns nothing.
Another approach is to use OR
operator, so query like this:
SELECT * FROM pc_index WHERE MATCH('word1 | word2 | word3 | word4')
This way I can find something even if there's only one word presented. But I can't find a way to make SphinxSearch sort the results by amount of the query words presented in the results. Another words - I want to get the results the way when first results contains the most amount of searching words, for example word1
, word2
and word3
, no matter how many times those words appear in the results.
I have tried different rankers, like SPH_RANK_WORDCOUNT
, but they don't make any difference. And I can't use php
to get the relevance since I use stemming and word from the query can be in the different form in the result.
Is there any way to achieve that performance?
Upvotes: 0
Views: 38
Reputation: 1318
Well, if someone would be interested in the same question - the only solution I came up with is this. This is legacy project so here is pure php
. The idea is to split the search query to single words, make separate searchings of every word and sort all of the results afterwards using php
:
if ($search) {
require_once 'SphinxClient.php';
$client = new SphinxClient();
$client->setServer("sphinxsearch", 9312);
$client->setLimits(0, 1000);
/* split the incoming string */
$words = explode(' ', $search);
$relevance = [];
/* make some limits to prevent large amount of queries */
$limit = 20;
foreach ($words as $word) {
if ($word && $limit > 0) {
/*
search for the word itself and also use the wildcard if it's
only part of the word
enable min_infix_len in the index configuration to make that work
*/
$result = $client->query($word . ' | *' . $word . '*', 'pc_index');
if ($result !== false) {
$matches = $result['matches'];
if ($matches) {
/* fill the $relevance array, where the keys are the id and the values are the amounts of incoming words in this exact result */
foreach (array_keys($matches) as $id) {
if (isset($relevance[$id])) {
$relevance[$id]++;
} else {
$relevance[$id] = 1;
}
}
}
}
$limit--;
}
}
/* sort the $relevance by values and get the needed ids */
arsort($relevance);
$ids = [0];
if ($relevance) {
$ids = array_keys($relevance);
}
/* and make query like this with ordering by our relevance */
$sqlQuery = "
SELECT *
FROM products
WHERE products.id IN (" . implode(', ', $ids) . ")
ORDER BY FIELD(products.id, ' . implode(', ', $ids) . ')
";
}
This could be easier if you don't use stemming, this way you don't need to make separate queries for each word.
Anyway, I don't like this answer, but this is the only way I found to make it work as I wanted. If someone tells me the better way it would be awesome.
Upvotes: 0