Parziphal
Parziphal

Reputation: 6922

How to search on MySQL using JOINs?

Can anyone tell me ways to do this kind of search in a database?

I got these tables:

posts (id, tags_cache)
tags (id, name)
posts_tags (post_id, tag_id)

The user enters a search query (say "water blue") and I want to show the posts that have both tags. The only way I can think of to search is using FIND_IN_SET, this way:

SELECT p.*, GROUP_CONCAT(t.name) AS tags_search
FROM posts p
LEFT JOIN posts_tags pt ON p.id = pt.post_id
LEFT JOIN tags t ON pt.tag_id = t.id
GROUP BY p.id
HAVING FIND_IN_SET('water', tags_search) > 0
AND FIND_IN_SET('blue', tags_search) > 0

The posts.tags_cache text column stores the names and id of the tags it belongs to (this way: water:15 blue:20).

To avoid JOINs by using this column for search, I've tried LIKE and INSTR but these will give inexact results since you can search for "ter" and you'll gets posts tagged 'water' and 'termal' for example. I've also tried REGEXP which gives exact results, but it's a slow process.

I can't use MATCH as tables use InnoDB.

So... is or are there other ways to accomplish this?

[Edit]

I forgot to mention that the user could search for many tags (not just 2), and even exclude tags: search posts tagged 'water' but not 'blue'. With FIND_IN_SET this works for me:

HAVING FIND_IN_SET('water', tags_search) > 0
AND NOT FIND_IN_SET('blue', tags_search) > 0

[Edit2]

I did some performance test (i.e. only checked how long the queries took, cached) as ypercube suggested, and these are the results:

muists | Bill K | ypercu | includes:excludes
--------------------------
0.0137 | 0.0009 | 0.0029 | 2:0
0.0096 | 0.0081 | 0.0033 | 2:1
0.0111 | 0.0174 | 0.0033 | 2:2
0.0281 | 0.0081 | 0.0025 | 5:1
0.0014 | 0.0013 | 0.0015 | 0:2

I don't know if this info is valid resource... But it shows that ypercube's method with a JOIN per tag is the quickest.

Upvotes: 0

Views: 140

Answers (3)

Bill Karwin
Bill Karwin

Reputation: 563021

Finding posts that match all of several conditions on different rows is a common problem.

Here are two ways to do it:

SELECT p.*
FROM posts p
INNER JOIN posts_tags pt ON p.id = pt.post_id
INNER JOIN tags t ON pt.tag_id = t.id
WHERE t.name IN ('water', 'blue')
GROUP BY p.id
HAVING COUNT(DISTINCT t.name) = 2;

Or:

SELECT p.*
FROM posts p
INNER JOIN posts_tags pt1 ON p.id = pt1.post_id
INNER JOIN tags t1 ON pt1.tag_id = t1.id
INNER JOIN posts_tags pt2 ON p.id = pt2.post_id
INNER JOIN tags t2 ON pt2.tag_id = t2.id
WHERE (t1.name, t2.name) = ('water', 'blue');

Re comment and edit:

The problem with the HAVING solution is that it must perform a table-scan, searching every row in the tables. This is often much slower than a JOIN (when you have appropriate indexes).

To support tag exclusion conditions, here's how I'd write it:

SELECT p.*
FROM posts p
INNER JOIN posts_tags pt1 ON p.id = pt1.post_id
INNER JOIN tags t1 ON pt1.tag_id = t1.id AND t1.name = 'water'
LEFT OUTER JOIN (posts_tags pt2 
INNER JOIN tags t2 ON pt2.tag_id = t2.id AND t2.name = 'blue')
  ON p.id = pt2.post_id
WHERE t2.id IS NULL;

Avoiding using JOINs because you read it somewhere that they are bad is senseless. You must understand that a JOIN is a basic operation in relational databases, and you should use it where the job calls for it.

Upvotes: 3

ypercubeᵀᴹ
ypercubeᵀᴹ

Reputation: 115660

For your additional request, excluding some tags, you could use the next approach. It will give you all posts that have both water and blue tags but neither black, white or red:

SELECT p.*
FROM posts p
  INNER JOIN posts_tags pt1 ON p.id = pt1.post_id 
  INNER JOIN tags t1 ON pt1.tag_id = t1.id
  INNER JOIN posts_tags pt2 ON p.id = pt2.post_id
  INNER JOIN tags t2 ON pt2.tag_id = t2.id
WHERE (t1.name, t2.name) = ('water', 'blue')          --- include
  AND NOT EXISTS
      ( SELECT *
        FROM posts_tags pt
          INNER JOIN tags t ON pt.tag_id = t.id
        WHERE p.id = pt.post_id 
          AND t.name IN ('black', 'white', 'red')     --- exclude
      )

Upvotes: 1

mu is too short
mu is too short

Reputation: 434985

I don't understand why you don't want to use JOINs nor why you're trying to use LEFT JOINs. You're looking for things that are there (rather than might be there) so get rid of the LEFT JOINs and just JOIN. And get rid of the tags_cache column, you're only asking for trouble with that sort of thing.

Something like this is what you're looking for:

select p.id
from posts p
join posts_tags pt on p.id = pt.post_id
join tags t on pt.tag_id = t.id
where t.name in ('water', 'blue')
group by p.id
having count(t.id) = 2

The 2 in the HAVING clause is the number of tags you're looking for.

And if you want to exclude certain tags, you could just add that to the WHERE clause like this:

select p.id
from posts p
join posts_tags pt on p.id = pt.post_id
join tags t on pt.tag_id = t.id
where t.name in ('water', 'blue')
  and p.id not in (
    select pt.post_id
    from posts_tags pt
    join tags t on pt.tag_id = t.id
    where t.name in ('pancakes', 'eggs') -- Exclude these
)
group by p.id
having count(t.id) = 2

Upvotes: 4

Related Questions