DomeWTF
DomeWTF

Reputation: 2420

SQL select posts that have similar tags

In my MariaDB I have a table called

"Posts" with

id, title, username_id, text, image_url, url

one called "tags" with

id, tag

and one called post_tags with

id, post_id, tag_id.

The thing I want to accomplish is to get 3 posts from the "Posts" table that have the most tags in common with the post that's currently shown on the page.

I'm stuck here not knowing where to start.

EDIT

Posts

id | username_id |   title   |    text     |       image_url      |           url

 1        1         example    example_text  localhost/image.jpg     localhost/first-post
 2        1         example1   example_text  localhost/image1.jpg    localhost/second-post
 3        1         example2   example_text  localhost/image2.jpg    localhost/third-post
 4        1         example4   example_text  localhost/image4.jpg    localhost/fourth-post
...      ...          ...          ...                ...                     ...
...      ...          ...          ...                ...                     ...


Tags

id | tag

 1   herbs
 2   flower
 3   rose

Post_tags

id | post_id | tag_id

 1      1        1
 2      1        2
 3      1        3
 4      2        1
 5      3        1
 6      3        2
 7      4        1
 8      4        2
 9      4        3        

I'd like to return an array with posts.title and posts.image_url selecting the posts that have the most post_tags.tag_id in common with the current one.

As you can see, if we take post n. 1 as the selected post, post n. 4 has the most tags in common with it, post n.3 is in the second position and post n.2 is the third.

example4 | localhost/image4.jpg
example3 | localhost/image3.jpg
example2 | localhost/image2.jpg

I hope I made it clearer. Thank you.

Upvotes: 2

Views: 944

Answers (3)

Rick James
Rick James

Reputation: 142540

This will get you just the post_id values:

SELECT  x.post_id
    FROM  
    (
        SELECT  b.post_id
            FROM  Post_tags a
            JOIN  Post_tags b USING(tag_id)
            WHERE  a.post_id = 1234
              AND  b.post_id != a.post_id
    ) x
    GROUP BY  x.post_id
    ORDER BY  COUNT(*) DESC
    LIMIT  3;

Tips on designing a better mapping table (Post_tags). That will give you the optimal indexes for that table.

To get more info on the 3 posts:

SELECT  p.*
    FROM  
    (
        SELECT  x.post_id
            FROM  
            (
                SELECT  b.post_id
                    FROM  Post_tag a
                    JOIN  Post_tag b USING(tag_id)
                    WHERE  a.post_id = 1234
                      AND  b.post_id != a.post_id 
            ) AS x
            GROUP BY  x.post_id
            ORDER BY  COUNT(*) DESC
            LIMIT  3 
    ) AS y
    JOIN  Posts AS p  ON p.id = y.post_id;

Upvotes: 0

Łukasz Kamiński
Łukasz Kamiński

Reputation: 5950

SELECT p.id, p.title, p.image_url, COUNT(*) as how_many_shared_tags
FROM posts p
JOIN post_tags pt ON pt.post_id = p.id
                 AND pt.tag_id IN(SELECT tag_id FROM post_tags WHERE post_id = 1)
WHERE p.id != 1
GROUP BY p.id, p.title, p.image_url
order by COUNT(*) DESC
LIMIT 3

As requested, explanation for query:

  1. In order to find top 3 posts that share most tags with our "parent" post, we first need to get list of tags that "parent" has => SELECT tag_id FROM post_tags WHERE post_id = 1
  2. Then find posts that have at least one of those tags by searching in table that holds both ID for posts and tags by adding condition tag_id IN(LIST_OF_tag_id_FROM_SUB_SELECT_SHOWN_ABOVE).
  3. Now we know what posts share at least one tag with "parent", so we can count how many tags they have actually in common and sort by it => order by COUNT(*) DESC
  4. Because "parent" post also "shares" those tags and we don't want him in our result, we give additional condition excluding ID of our "parent" => WHERE p.id != 1
  5. Finally we limit result set to 3 rows, because we want only top 3. LIMIT 3
  6. Selecting count is not required, it is only to point out what it counts COUNT(*) as how_many_common_tags

Upvotes: 6

jetblack
jetblack

Reputation: 590

Check this if it gives what you want to get. Then you may need to optimize query.

SELECT
t1.*
FROM posts t1,
(
    SELECT
    post_id
    FROM post_tags t2
    WHERE
    tag_id IN (SELECT tag_id FROM post_tags WHERE post_id = $CURRENT_POST_ID)
    AND NOT post_id = $CURRENT_POST_ID
    GROUP BY post_id
    ORDER BY COUNT(tag_id) DESC
    LIMIT 3
) t2
WHERE
t1.id = t2.post_id

Upvotes: 0

Related Questions