Reputation: 1364
I know basic MySQL querying, but I have no idea how to achieve an accurate and relevant search query.
My table look like this:
id | kanji
-------------
1 | 一子
2 | 一人子
3 | 一私人
4 | 一時
5 | 一時逃れ
I already have this query:
SELECT * FROM `definition` WHERE `kanji` LIKE '%一%'
The problem is that I want to order the results from the learnt characters, 一 being a required character for the results of this query.
Say, a user knows those characters: 人,子,時
Then, I want the results to be ordered that way:
id | kanji
-------------
2 | 一人子
1 | 一子
4 | 一時
3 | 一私人
5 | 一時逃れ
The result which matches the most learnt characters should be first. If possible, I'd like to show results that contain only learnt characters first, then a mix of learnt and unknown characters.
How do I do that?
Upvotes: 0
Views: 190
Reputation: 3953
Just a thought, but a text index may help, you can get a score back like this:
SELECT match(kanji) against ('your search' in natural language mode) as rank
FROM `definition` WHERE match(`kanji`) against ('your search' in natural language mode)
order by rank, length(kanji)
The trick is to index these terms (or words?) the right way. I think the general trick is to encapsulate each word with double quotes and make a space between each. This way the tokenizer will populate the index the way you want. Of course you would need to add/remove the quotes on the way in/out respectively.
Hope this doesn't bog you down.
Upvotes: 0
Reputation: 83411
Per your preference, ordering by number of unmatched characters (increasing), and then number of matched character (decreasing).
SELECT *,
(kanji LIKE '%人%')
+ (kanji LIKE '%子%')
+ (kanji LIKE '%時%') score
FROM kanji
ORDER BY CHAR_LENGTH(kanji) - score, score DESC
Or, the relational way to do it is to normalize. Create the table like this:
kanji_characters
kanji_id | index | character
----------------------------
1 | 0 | 一
1 | 1 | 子
2 | 0 | 一
2 | 1 | 人
2 | 2 | 子
...
Then
SELECT kanji_id,
COUNT(*) length,
SUM(CASE WHEN character IN ('人','子','時') THEN 1 END) score
FROM kanji_characters
WHERE index <> 0
AND kanji_id IN (SELECT kanji_id FROM kanji_characters WHERE index = 0 AND character = '一')
GROUP BY kanji_id
ORDER BY length - score, score DESC
Though you didn't specify what should be done in the case of duplicate characters. The two solutions above handle that differently.
Upvotes: 1