Reputation: 41
I'm collecting some data (tweet with hashtag) and working to create a stats with the following table structure:
My stats goal is show how many child and how many impression per tweet
Query :
SELECT parent.tweet_id, parent.tweet_text, parent.tweet_time, parent.tweet_image, parent.user_id, parent.user_name, parent.user_follower, parent.user_following, parent.is_retweet, parent.is_favorite, parent.is_reply,
(
SELECT COUNT(tweet_id)
FROM tweet
WHERE tweet_status = 1 && user_follower > 0 && is_retweet = parent.tweet_id
) as child,
(
SELECT (COALESCE(SUM(user_follower),0) + parent.user_follower)
FROM tweet WHERE tweet_status = 1 && user_follower > 0 && is_retweet = parent.tweet_id
) as impression
FROM tweet AS parent
WHERE parent.tweet_status = 1 AND parent.is_retweet = 0 AND parent.is_favorite = 0 AND parent.is_reply = 0
ORDER BY parent.tweet_time DESC
child
: count total tweet where is_retweet = parent.tweet_id
impression
: parent.user_follower + sum user_follewer where is_retweet = parent.tweet_id
My query is too slow when getting child
and impression
and i dont know how to optimize :(. But, the real problem is when i want to find top 10 impact base on impression, ORDER BY impression
look foolish.
I expect this all help to simplify this query :)
Upvotes: 1
Views: 503
Reputation: 1269753
This is your query (essentially):
SELECT parent.*,
(SELECT COUNT(*)
FROM tweet t
WHERE t.tweet_status = 1 AND t.user_follower > 0 AND
t.is_retweet = parent.tweet_id
) as child,
(SELECT (COALESCE(SUM(t.user_follower), 0) + parent.user_follower)
FROM tweet t
WHERE t.tweet_status = 1 AND t.user_follower > 0 AND
t.is_retweet = parent.tweet_id
) as impression
FROM tweet AS parent
WHERE parent.tweet_status = 1 AND parent.is_retweet = 0 AND
parent.is_favorite = 0 AND parent.is_reply = 0
ORDER BY parent.tweet_time DESC;
Under many circumstances this is probably the best way to write the query. What you need are indexes: tweet(twee_status, is_retweet, is_vaforite, is_replay, tweet_time, tweet_id)
and tweet(is_retweet, tweet_status, user_follower)
. I think that these will remove any aggregation or sorting on the tweet table, handling all the filtering and calculations in the index.
Upvotes: 0
Reputation: 34231
I would start with moving the subqueries out of the select list into the from clause as derived tables. You need only a single subquery, since the 2 subqueries have the same where condition, including the join criterion. The derived table should be grouped by is_retweet because that represents the parent-child relationship. Obviously, the impression still has to be calculated in the select list, since the derived table can only supply the followers of the re-tweets only.
SELECT parent.tweet_id, parent.tweet_text, parent.tweet_time, parent.tweet_image, parent.user_id, parent.user_name, parent.user_follower, parent.user_following, parent.is_retweet, parent.is_favorite, parent.is_reply,
COALESCE(t.child,0) as child,
COALESCE(t.sum_child_follower,0) + parent.user_follower as impression
FROM tweet AS parent
LEFT JOIN
(
SELECT is_retweet, COUNT(tweet_id) as child, SUM(user_follower) as sum_child_follower
FROM tweet
WHERE tweet_status = 1 && user_follower > 0
GROUP BY is_retweet
) as t ON t.is_retweet=parent.tweet_id
WHERE parent.tweet_status = 1 AND parent.is_retweet = 0 AND parent.is_favorite = 0 AND parent.is_reply = 0
ORDER BY parent.tweet_time DESC
The query could be further enhanced using appropriate indexes, but I do not have enough to go on for them. However, a composite index on the fields in the outer query's where criteria seems to be a good place to start - if you have not done so.
Unfortunately, to get only the top 10 impression you have to use the order by calculated impression field and limit clause. It is not going to really speed up the query, since mysql has to calculate all impressions before it could do an ordering.
Upvotes: 2