Reputation:
I've never done searching from MYSQL before, but I need to implement a search. I have three tables: articles
, articles_tags
, and tags
.
The table articles
holds the first thing I would like to search on, the title
field.
The table articles_tags
is a pivot table which relates articles
and tags
together. articles_tags
has two fields, that are: articles_id
and tag_id
.
And, the table tags
holds the second thing I would like to search on, the name
field.
My problem is, I need a way to search the title
field, and each of the tags that relate to that article (tags.name
) and return a relevancy (or sort by relevancy) for the specific article.
What would be a good way to implement this? I'm pretty sure it can't be done from just one query so two queries, and then mixing
the relevancies together, would be ok.
Thanks.
Edit: Forgot to say, if I could give more weighting to matching a tag than matching a word in the title, that would be awesome. I'm not really asking for anyone to write the thing, but give me some direction. I'm a bit of a newbie in both PHP and MySQL.
Upvotes: 5
Views: 2894
Reputation: 475
This quick demo query is far from optimized but should be a good starting point
SELECT * FROM
(SELECT a.id, a.title,
MATCH (a.title) AGAINST ('$s_search_term') AS title_score,
SUM(MATCH (t.name) AGAINST ('$s_search_term')
) AS tag_score
FROM articles AS a
LEFT JOIN articles_tags AS at
ON a.id = at.article_id
LEFT JOIN tags AS t
ON t.id = at.tag_id
WHERE MATCH (a.title) AGAINST ('$s_search_term')
OR MATCH (t.name) AGAINST ('$s_search_term')
GROUP BY a.id) AS table1
ORDER BY 2*tag_score + title_score DESC
You may want to normalize tag_score by dividing it by COUNT(t.id). Sorry but it's easier to give the query than to explain how to make it.
Upvotes: 0
Reputation: 13929
Is it worth at this time, recommending that you look at offloading the job of search to something that is actually written just for that purpose?
In our products, we use MySQL to store data, but index all of our data with Lucene (via Solr - but that's irrelevant).
It's worth giving it a look into, because it's relatively simple to set up, it's very powerful and it's a lot easier than trying to manipulate the database into doing what you want.
Sorry this isn't a direct answer to the question, I just feel that this kind of thing is always worth mentioning in this scenario :)
Upvotes: 2
Reputation: 1510
Here is how I have done this in the past. It looks slow, but I think you will find it is not.
I added a little complexity to show what else can easily be done. In this example an article will get 1 point for a partial title match, 2 points for a partial tag match, 3 points for an exact tag match, and 4 points for an exact title match. It then adds those up and sorts by the score.
SELECT
a.*,
SUM(
CASE WHEN a.title LIKE '%keyword%' THEN 1 ELSE 0 END
+
CASE WHEN t.name LIKE '%keyword%' THEN 2 ELSE 0 END
+
CASE WHEN t.name = 'keyword' THEN 3 ELSE 0 END
+
CASE WHEN a.title = 'keyword' THEN 4 ELSE END
) AS score
FROM article a, articles_tags at, tags t
WHERE a.id = at.article_id
AND at.tag_id=t.id
AND (a.title LIKE '%keyword%' OR t.name LIKE '%keyword%')
GROUP BY a.id
ORDER BY score;
NOTES: This will not return articles without tags. I used simple joins to reduce the noise in the query and highlight just what is doing the scoring. To include articles without tags just make the joins left joins.
Upvotes: 1
Reputation: 6140
You might want to look into sphinx, http://www.sphinxsearch.com/
Upvotes: 0
Reputation: 562250
Starting from the answer given by @james.c.funk but making some changes.
SELECT a.id, a.title,
MATCH (a.title) AGAINST (?) AS relevance
FROM articles AS a
LEFT OUTER JOIN (articles_tags AS at
JOIN tags AS t ON (t.id = at.tag_id AND t.name = ?))
ON (a.id = at.article_id)
WHERE MATCH (a.title) AGAINST (? IN BOOLEAN MODE)
ORDER BY IF(t.name IS NOT NULL, 1.0, relevance) DESC;
I assume you want tag matches to match against the full string, instead of using a fulltext search.
Also using one left outer join instead of two, because if a join to articles_tags
is satisfied, then surely there is a tag. Put the tag name comparison inside the join condition instead of in the WHERE
clause.
The boolean mode makes MATCH()
returns 1.0 on a match, which makes it useless as a measure of relevance. So do an extra comparison in the select-list to calculate the relevance. This value is between 0.0 and 1.0. Now we can make a tag match sort higher by treating it as having relevance of 1.0.
Upvotes: 2