alexanoid
alexanoid

Reputation: 25872

MySQL JOIN / IN performance optimization

I have a following MySQL query:

SELECT 
    p.post_id,
    p.date_created,
    p.description, 
    p.last_edited, 
    p.link, 
    p.link_description, 
    p.link_image_url, 
    p.link_title, 
    p.total_comments, 
    p.total_votes, 
    p.type_id, 
    p.user_id 
FROM posts p JOIN posts_to_tribes ptt ON p.post_id=ptt.post_id 
WHERE ptt.tribe_id IN (1, 2, 3, 4, 5) 
GROUP BY p.post_id 
ORDER BY p.last_edited DESC, p.total_votes DESC LIMIT 25

In a non-concurrent environment this query runs ~172ms but in concurrent environment runs 1-2 sec(during the performance testing).

EXPLAIN output:

enter image description here

Indexes on posts_to_tribes table:

enter image description here

Is any way in order to improve performance here ?

Upvotes: 0

Views: 220

Answers (2)

Rick James
Rick James

Reputation: 142540

You need a composite index for posts_to_tribes: INDEX(tribe_id, post_id).

The GROUP BY was to compensate for the JOIN exploding the number of rows. Here is a better workaround, than IN ( SELECT ... ):

SELECT  p.post_id, p.date_created, p.description, p.last_edited,
        p.link, p.link_description, p.link_image_url, p.link_title,
        p.total_comments, p.total_votes, p.type_id, p.user_id
    FROM  posts p
    JOIN  
      ( SELECT  DISTINCT  post_id
            FROM  posts_to_tribes
            WHERE  tribe_id IN (1, 2, 3, 4, 5)
      ) AS ptt USING (post_id)
    ORDER BY  p.last_edited DESC,
              p.total_votes DESC
    LIMIT  25

Upvotes: 1

Lukas Eder
Lukas Eder

Reputation: 221370

You have applied a JOIN operation, when you really wanted to apply a semi-join between your two tables (semi-join in SQL is implemented using IN or EXISTS predicates).

Because you used the wrong type of JOIN, you have then removed the duplicate records again using GROUP BY. That's a lot of wasted CPU cycles right there.

The following query will be much faster:

SELECT 
    p.post_id,
    p.date_created,
    p.description, 
    p.last_edited, 
    p.link, 
    p.link_description, 
    p.link_image_url, 
    p.link_title, 
    p.total_comments, 
    p.total_votes, 
    p.type_id, 
    p.user_id 
FROM posts p 
WHERE p.post_id IN (
  SELECT ptt.post_id
  FROM posts_to_tribes ptt
  WHERE ptt.tribe_id IN (1, 2, 3, 4, 5)
)
ORDER BY p.last_edited DESC, p.total_votes DESC LIMIT 25

You should still have indexes on (p.post_id) and (ptt.tribe_id, ptt.post_id)

Upvotes: 1

Related Questions