Reputation: 2539
I'm having my weird trouble with a friends feed query - here is the background:
I have 3 tables
checkin - around 13m records
users - around 250k records
friends - around 1.5m records
In the checkin table - it lists activity that are performed by users. (here are numerous indexes, however there is an index on user_id, created_at, and (user_id,created_at). The users table is just the basic user information There is an index on user_id. The friends table has a user_id, target_id and is_approved. There is an index on the (user_id, is_approved) fields.
In my query, I am trying to pull down just a basic friends feed of any users - so I have been doing this:
SELECT checkin_id, created_at
FROM checkin
WHERE (user_id IN (SELECT friend_id from friends where user_id = 1 and is_approved = 1) OR user_id = 1)
ORDER by created_at DESC
LIMIT 0, 15
The goal of the query is just to pull the checkin_id and created_at for all the users' friend plus their activity. It's a pretty simple query, but when a user's friends have tons of recent activity, this query is very quick, here is the EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY checkin index user_id,user_id_2 created_at 8 NULL 15 Using where
2 DEPENDENT SUBQUERY friends eq_ref user_id,friend_id,is_approved,friend_looku... PRIMARY 8 const,func 1 Using where
As an explanation, user_id is a simple index on user_id - while user_id_2 is an index on user_id and created_at. On the friends table, friends_lookup is the index of user_id and is_approved.
This is a very simple query and get's completed in: Showing rows 0 - 14 (15 total, Query took 0.0073 sec).
However when a user's friends activity is not very recent and there isn't a lot of the data, the same query takes around 5-7 seconds and it has the same EXPLAIN as the previous query - but takes longer.
It doesn't seem to have an affect on more friends, it seems to speed up with more recent activity.
Is there any tips that anyone have to optimize these queries to makes sure they run the same speed irregardless of activity?
Server Setup
This is a dedicated MySQL server running 16GB of RAM. It is running Ubuntu 10.10 and the version of MySQL is 5.1.49
UPDATE
So most people have suggested remove the IN piece and move them into a INNER JOIN:
SELECT c.checkin_id, c.created_at
FROM checkin c
INNER JOIN friends f ON c.user_id = f.friend_id
WHERE f.user_id =1
AND f.is_approved =1
ORDER BY c.created_at DESC
LIMIT 0 , 15
This query is 10x worse - as reported in the EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE f ref PRIMARY,user_id,friend_id,is_approved,friend_looku... friend_lookup 5 const,const 938 Using temporary; Using filesort
1 SIMPLE c ref user_id,user_id_2 user_id 4 untappd_prod.f.friend_id 71 Using where
The goal of this query to get all the friends activity, and yours in the same query (instead of having to create two queries and merge the results together and sort by created_at). I also can't remove the index on user_id as it's important piece of another query.
The interesting part is when I run this query on a user account that doesn't have a lot activity, I get this explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE f index_merge PRIMARY,user_id,friend_id,is_approved,friend_looku... user_id,friend_lookup 4,5 NULL 11 Using intersect(user_id,friend_lookup); Using wher...
1 SIMPLE c ref user_id,user_id_2 user_id 4 untappd_prod.f.friend_id 71 Using where
Any advice?
Upvotes: 1
Views: 386
Reputation: 21
so.. you have a few things going on here..
in the explain plan .. usually the optimizer will choose whats in "key" and not whats in possible_keys. So thats why you experience when it needs to scan more records when the data is not recent.
on checkin table only ( user_id, created_at ) and created_at is necessary.. you dont need another index for user_id.. the optimizer will use (user_id, created_at ) since user_id is the first order.
try this..
use join between friends and checkin and remove the in clause, such that friends becomes the driving table and you should see that first on the execution path of your explain plan.
with 1 done, you should make sure that checkin is using (user_id, created_dt ) index in the execution path.
write another query for the OR condition where user_id from checkin table is 1. I think your data set should be mutually exclusive for these two sets, it should then be ok .. or else you would not need to have the OR condition after the IN clause in the first place.
remove the user_id index thats by it self as you have user_id, created_at index.
-- your goal is that it uses the index under key not just possible keys.
this should take care of older non recent checkins as well as recent ones.
Upvotes: 2
Reputation: 17610
My first suggestion is to remove the dependent subquery and turn it into a join. I've found that MySQL is not good at processing these types of queries. Try this:
SELECT c.checkin_id, c.created_at
FROM checkin c
INNER JOIN friends f
ON c.user_id = f.friend_id
WHERE f.user_id = 1
AND f.is_approved = 1
ORDER by c.created_at DESC
LIMIT 0, 15
My second suggestion, since you have a dedicated server, is to use the InnoDB storage engine for all your tables. Make sure that you tweak default InnoDB settings, especially for innodb_buffer_pool_size: http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/
Upvotes: 0