Adrian Yoan
Adrian Yoan

Reputation: 41

MySQL Optimize Subquery & Order By Aggregate

I'm collecting some data (tweet with hashtag) and working to create a stats with the following table structure:

enter image description here

My stats goal is show how many child and how many impression per tweet

Query :

SELECT parent.tweet_id, parent.tweet_text, parent.tweet_time, parent.tweet_image, parent.user_id, parent.user_name, parent.user_follower, parent.user_following, parent.is_retweet, parent.is_favorite, parent.is_reply, 
(
    SELECT COUNT(tweet_id) 
    FROM tweet 
    WHERE tweet_status = 1 && user_follower > 0 && is_retweet = parent.tweet_id
) as child, 
(
    SELECT (COALESCE(SUM(user_follower),0) + parent.user_follower)
    FROM tweet WHERE tweet_status = 1 && user_follower > 0 && is_retweet = parent.tweet_id
) as impression 
FROM tweet AS parent 
WHERE parent.tweet_status = 1 AND parent.is_retweet = 0 AND parent.is_favorite = 0 AND parent.is_reply = 0 
ORDER BY parent.tweet_time DESC

child : count total tweet where is_retweet = parent.tweet_id

impression : parent.user_follower + sum user_follewer where is_retweet = parent.tweet_id

My query is too slow when getting child and impression and i dont know how to optimize :(. But, the real problem is when i want to find top 10 impact base on impression, ORDER BY impression look foolish.

I expect this all help to simplify this query :)

Upvotes: 1

Views: 503

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269753

This is your query (essentially):

SELECT parent.*, 
       (SELECT COUNT(*) 
        FROM tweet t
        WHERE t.tweet_status = 1 AND t.user_follower > 0 AND
              t.is_retweet = parent.tweet_id
       ) as child, 
       (SELECT (COALESCE(SUM(t.user_follower), 0) + parent.user_follower)
        FROM tweet t
        WHERE t.tweet_status = 1 AND t.user_follower > 0 AND
              t.is_retweet = parent.tweet_id
       ) as impression 
FROM tweet AS parent 
WHERE parent.tweet_status = 1 AND parent.is_retweet = 0 AND
      parent.is_favorite = 0 AND parent.is_reply = 0 
ORDER BY parent.tweet_time DESC;

Under many circumstances this is probably the best way to write the query. What you need are indexes: tweet(twee_status, is_retweet, is_vaforite, is_replay, tweet_time, tweet_id) and tweet(is_retweet, tweet_status, user_follower). I think that these will remove any aggregation or sorting on the tweet table, handling all the filtering and calculations in the index.

Upvotes: 0

Shadow
Shadow

Reputation: 34231

I would start with moving the subqueries out of the select list into the from clause as derived tables. You need only a single subquery, since the 2 subqueries have the same where condition, including the join criterion. The derived table should be grouped by is_retweet because that represents the parent-child relationship. Obviously, the impression still has to be calculated in the select list, since the derived table can only supply the followers of the re-tweets only.

SELECT parent.tweet_id, parent.tweet_text, parent.tweet_time, parent.tweet_image, parent.user_id, parent.user_name, parent.user_follower, parent.user_following, parent.is_retweet, parent.is_favorite, parent.is_reply, 
COALESCE(t.child,0) as child,
COALESCE(t.sum_child_follower,0) + parent.user_follower as impression 
FROM tweet AS parent
LEFT JOIN
(
    SELECT is_retweet, COUNT(tweet_id) as child, SUM(user_follower) as sum_child_follower
    FROM tweet 
    WHERE tweet_status = 1 && user_follower > 0
    GROUP BY is_retweet
) as t ON t.is_retweet=parent.tweet_id
WHERE parent.tweet_status = 1 AND parent.is_retweet = 0 AND parent.is_favorite = 0 AND parent.is_reply = 0 
ORDER BY parent.tweet_time DESC

The query could be further enhanced using appropriate indexes, but I do not have enough to go on for them. However, a composite index on the fields in the outer query's where criteria seems to be a good place to start - if you have not done so.

Unfortunately, to get only the top 10 impression you have to use the order by calculated impression field and limit clause. It is not going to really speed up the query, since mysql has to calculate all impressions before it could do an ordering.

Upvotes: 2

Related Questions