james
james

Reputation:

MySQL Full text boolean search with tags

I've never done searching from MYSQL before, but I need to implement a search. I have three tables: articles, articles_tags, and tags.

The table articles holds the first thing I would like to search on, the title field.

The table articles_tags is a pivot table which relates articles and tags together. articles_tags has two fields, that are: articles_id and tag_id.

And, the table tags holds the second thing I would like to search on, the name field.

My problem is, I need a way to search the title field, and each of the tags that relate to that article (tags.name) and return a relevancy (or sort by relevancy) for the specific article.

What would be a good way to implement this? I'm pretty sure it can't be done from just one query so two queries, and then mixing the relevancies together, would be ok.

Thanks.

Edit: Forgot to say, if I could give more weighting to matching a tag than matching a word in the title, that would be awesome. I'm not really asking for anyone to write the thing, but give me some direction. I'm a bit of a newbie in both PHP and MySQL.

Upvotes: 5

Views: 2894

Answers (6)

james.c.funk
james.c.funk

Reputation: 475

This quick demo query is far from optimized but should be a good starting point

SELECT * FROM
(SELECT a.id, a.title, 
  MATCH (a.title) AGAINST ('$s_search_term') AS title_score,
  SUM(MATCH (t.name) AGAINST ('$s_search_term')
) AS tag_score
FROM articles AS a
LEFT JOIN articles_tags AS at
  ON a.id = at.article_id
LEFT JOIN tags AS t
  ON t.id = at.tag_id
WHERE MATCH (a.title) AGAINST ('$s_search_term') 
  OR MATCH (t.name) AGAINST ('$s_search_term')
GROUP BY a.id) AS table1
ORDER BY 2*tag_score + title_score DESC

You may want to normalize tag_score by dividing it by COUNT(t.id). Sorry but it's easier to give the query than to explain how to make it.

Upvotes: 0

metdos
metdos

Reputation: 13929

Is it worth at this time, recommending that you look at offloading the job of search to something that is actually written just for that purpose?

In our products, we use MySQL to store data, but index all of our data with Lucene (via Solr - but that's irrelevant).

It's worth giving it a look into, because it's relatively simple to set up, it's very powerful and it's a lot easier than trying to manipulate the database into doing what you want.

Sorry this isn't a direct answer to the question, I just feel that this kind of thing is always worth mentioning in this scenario :)

Upvotes: 2

Jackson Miller
Jackson Miller

Reputation: 1510

Here is how I have done this in the past. It looks slow, but I think you will find it is not.

I added a little complexity to show what else can easily be done. In this example an article will get 1 point for a partial title match, 2 points for a partial tag match, 3 points for an exact tag match, and 4 points for an exact title match. It then adds those up and sorts by the score.

SELECT
  a.*,
  SUM(
    CASE WHEN a.title LIKE '%keyword%' THEN 1 ELSE 0 END
    +
    CASE WHEN t.name LIKE '%keyword%' THEN 2 ELSE 0 END
    +
    CASE WHEN t.name = 'keyword' THEN 3 ELSE 0 END
    +
    CASE WHEN a.title = 'keyword' THEN 4 ELSE END
  ) AS score
FROM article a, articles_tags at, tags t
WHERE a.id = at.article_id
AND at.tag_id=t.id
AND (a.title LIKE '%keyword%' OR t.name LIKE '%keyword%')
GROUP BY a.id
ORDER BY score;

NOTES: This will not return articles without tags. I used simple joins to reduce the noise in the query and highlight just what is doing the scoring. To include articles without tags just make the joins left joins.

Upvotes: 1

jhogendorn
jhogendorn

Reputation: 6140

You might want to look into sphinx, http://www.sphinxsearch.com/

Upvotes: 0

Bill Karwin
Bill Karwin

Reputation: 562250

Starting from the answer given by @james.c.funk but making some changes.

SELECT a.id, a.title, 
  MATCH (a.title) AGAINST (?) AS relevance
FROM articles AS a
LEFT OUTER JOIN (articles_tags AS at
  JOIN tags AS t ON (t.id = at.tag_id AND t.name = ?))
  ON (a.id = at.article_id)
WHERE MATCH (a.title) AGAINST (? IN BOOLEAN MODE) 
ORDER BY IF(t.name IS NOT NULL, 1.0, relevance) DESC;

I assume you want tag matches to match against the full string, instead of using a fulltext search.

Also using one left outer join instead of two, because if a join to articles_tags is satisfied, then surely there is a tag. Put the tag name comparison inside the join condition instead of in the WHERE clause.

The boolean mode makes MATCH() returns 1.0 on a match, which makes it useless as a measure of relevance. So do an extra comparison in the select-list to calculate the relevance. This value is between 0.0 and 1.0. Now we can make a tag match sort higher by treating it as having relevance of 1.0.

Upvotes: 2

dimus
dimus

Reputation: 9372

Funny it is the 3rd question about pretty much the same problem I see in 2 days, check out these two posts: 1, 2

Upvotes: -1

Related Questions