Jason Axelrod
Jason Axelrod

Reputation: 7805

MySQL query too slow?

So I have one query:

SELECT EWRporta2_articles.*, xf_thread.*, xf_forum.*, xf_user.*, xf_post.message,
    IF(NOT ISNULL(xf_user.user_id), xf_user.username, xf_thread.username) AS username
FROM EWRporta2_articles
    INNER JOIN xf_thread ON (xf_thread.thread_id = EWRporta2_articles.thread_id)
    INNER JOIN xf_forum ON (xf_forum.node_id = xf_thread.node_id)
    INNER JOIN xf_post ON (xf_post.post_id = xf_thread.first_post_id)
    LEFT JOIN xf_user ON (xf_user.user_id = xf_thread.user_id)
WHERE EWRporta2_articles.article_date < 1417987751
    AND xf_thread.discussion_state = 'visible'
ORDER BY EWRporta2_articles.article_date DESC
LIMIT 0, 5

This query executes in 0.0012 seconds... thats good. What this query does is query for a list of articles, then links them to a thread on my forum.

However, I am trying to slightly alter the query. While the query above requires an article row to exist for the thread. I would like to find threads that link to a specific article OR exist in a specific forum node id. So even if an article row doesn't exist for a thread, if it has a specific node_id, it will still show up. This is the query I have for that:

SELECT EWRporta2_articles.*, xf_thread.*, xf_forum.*, xf_user.*, xf_post.message,
    IF(EWRporta2_articles.article_date IS NULL, xf_thread.post_date, EWRporta2_articles.article_date) AS article_date,
    IF(NOT ISNULL(xf_user.user_id), xf_user.username, xf_thread.username) AS username
FROM xf_thread
    LEFT JOIN EWRporta2_articles ON (EWRporta2_articles.thread_id = xf_thread.thread_id)
    INNER JOIN xf_forum ON (xf_forum.node_id = xf_thread.node_id)
    INNER JOIN xf_post ON (xf_post.post_id = xf_thread.first_post_id)
    LEFT JOIN xf_user ON (xf_user.user_id = xf_thread.user_id)
WHERE ( xf_thread.node_id IN ('66','78') OR EWRporta2_articles.article_date IS NOT NULL )
    AND IF(EWRporta2_articles.article_date IS NULL, xf_thread.post_date, EWRporta2_articles.article_date) < 1417987751
    AND xf_thread.discussion_state = 'visible'
ORDER BY article_date DESC
LIMIT 0, 5

The problem with this query is it executes in 0.5683 seconds.

Is there anything I can do to improve performance here?

Upvotes: 0

Views: 74

Answers (1)

krokodilko
krokodilko

Reputation: 36107

First of all - never ever don't do that:

AND IF(EWRporta2_articles.article_date IS NULL, xf_thread.post_date
           , EWRporta2_articles.article_date) < 1417987751

Using a functon prevents any database to use indexes in order to optimize an expression like that.

Rewrite it into this form:

EWRporta2_articles.article_date IS NULL AND xf_thread.post_date < 1417987751
OR
EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date < 1417987751

And now use laws of boolean algebra to simplify the condition of the query, especially using the law: Distirbutivity of AND over OR: http://en.wikipedia.org/wiki/Boolean_algebra
X AND ( Y OR Z ) => X AND Y OR X AND Z



If you apply this law to this condition:

WHERE 
( xf_thread.node_id IN ('66','78') OR EWRporta2_articles.article_date IS NOT NULL )
AND
(
    EWRporta2_articles.article_date IS NULL AND xf_thread.post_date < 1417987751
    OR
    EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date < 1417987751
)

you will get:

xf_thread.node_id IN ('66','78') AND  EWRporta2_articles.article_date IS NULL AND xf_thread.post_date < 1417987751
OR
xf_thread.node_id IN ('66','78') AND EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date < 1417987751
OR
EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date IS NULL AND xf_thread.post_date < 1417987751
OR
EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date < 1417987751

The third condition is always false, so we can skip it:

EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date IS NULL ... 

The second one and the fourth one:

xf_thread.node_id IN ('66','78') AND EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date < 1417987751
OR
EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date < 1417987751

can be simplified into:

EWRporta2_articles.article_date IS NOT NULL AND EWRporta2_articles.article_date < 1417987751

and it can be further simplified to just:

EWRporta2_articles.article_date < 1417987751

In the end we will get:

WHERE (
     xf_thread.node_id IN ('66','78') 
     AND  
     EWRporta2_articles.article_date IS NULL 
     AND    
     xf_thread.post_date < 1417987751
     OR
     EWRporta2_articles.article_date < 1417987751
  )
  AND xf_thread.discussion_state = 'visible'

And now divide the query into two separate subqueries, then union their results in this way:

SELECT *
FROM (
     SELECT ......
     JOIN ... JOIN ... JOIN ...
     WHERE 
         xf_thread.node_id IN ('66','78') 
         AND EWRporta2_articles.article_date IS NULL 
         AND xf_thread.post_date < 1417987751
         AND xf_thread.discussion_state = 'visible'
     -- ORDER BY article_date DESC
     LIMIT 0, 5
) q1
UNION
SELECT *
FROM (
     SELECT ......
     JOIN ... JOIN ... JOIN ...
     WHERE 
         EWRporta2_articles.article_date < 1417987751
         AND xf_thread.discussion_state = 'visible'
     ORDER BY article_date DESC
     LIMIT 0, 5
) q2
ORDER BY article_date DESC
LIMIT 0, 5

Notice that in the first subquery the clause ORDER BY article_date DESC is skipped - since the condition says: AND EWRporta2_articles.article_date IS NULL, then the date is always NULL, and we can skip redundant sort operation, because it's a waste of time.

Upvotes: 1

Related Questions