nebulousGirl
nebulousGirl

Reputation: 1364

MySQL search query ordered by match relevance

I know basic MySQL querying, but I have no idea how to achieve an accurate and relevant search query.

My table look like this:

   id | kanji
   -------------
   1  | 一子
   2  | 一人子
   3  | 一私人
   4  | 一時
   5  | 一時逃れ

I already have this query:

SELECT * FROM  `definition` WHERE `kanji` LIKE '%一%'

The problem is that I want to order the results from the learnt characters, 一 being a required character for the results of this query.

Say, a user knows those characters: 人,子,時

Then, I want the results to be ordered that way:

   id | kanji
   -------------
   2  | 一人子
   1  | 一子
   4  | 一時
   3  | 一私人
   5  | 一時逃れ

The result which matches the most learnt characters should be first. If possible, I'd like to show results that contain only learnt characters first, then a mix of learnt and unknown characters.

How do I do that?

Upvotes: 0

Views: 190

Answers (2)

Mark Giaconia
Mark Giaconia

Reputation: 3953

Just a thought, but a text index may help, you can get a score back like this:

SELECT match(kanji) against ('your search' in natural language mode) as rank 
FROM  `definition` WHERE match(`kanji`) against ('your search' in natural language mode)
order by rank, length(kanji)

The trick is to index these terms (or words?) the right way. I think the general trick is to encapsulate each word with double quotes and make a space between each. This way the tokenizer will populate the index the way you want. Of course you would need to add/remove the quotes on the way in/out respectively.

Hope this doesn't bog you down.

Upvotes: 0

Paul Draper
Paul Draper

Reputation: 83411

Per your preference, ordering by number of unmatched characters (increasing), and then number of matched character (decreasing).

SELECT *,
    (kanji LIKE '%人%')
  + (kanji LIKE '%子%')
  + (kanji LIKE '%時%') score
FROM kanji 
ORDER BY CHAR_LENGTH(kanji) - score, score DESC

Or, the relational way to do it is to normalize. Create the table like this:

kanji_characters

kanji_id | index | character
----------------------------
      1  |     0 | 一
      1  |     1 | 子
      2  |     0 | 一
      2  |     1 | 人
      2  |     2 | 子
...

Then

SELECT kanji_id,
  COUNT(*) length,
  SUM(CASE WHEN character IN ('人','子','時') THEN 1 END) score
FROM kanji_characters
WHERE index <> 0
  AND kanji_id IN (SELECT kanji_id FROM kanji_characters WHERE index = 0 AND character = '一')
GROUP BY kanji_id
ORDER BY length - score, score DESC

Though you didn't specify what should be done in the case of duplicate characters. The two solutions above handle that differently.

Upvotes: 1

Related Questions