gregavola
gregavola

Reputation: 2539

Query Optimization for Friends Feed - MySQL

I'm having my weird trouble with a friends feed query - here is the background:

I have 3 tables

checkin - around 13m records
users - around 250k records
friends - around 1.5m records

In the checkin table - it lists activity that are performed by users. (here are numerous indexes, however there is an index on user_id, created_at, and (user_id,created_at). The users table is just the basic user information There is an index on user_id. The friends table has a user_id, target_id and is_approved. There is an index on the (user_id, is_approved) fields.

In my query, I am trying to pull down just a basic friends feed of any users - so I have been doing this:

SELECT checkin_id, created_at
FROM checkin
WHERE (user_id IN (SELECT friend_id from friends where user_id = 1 and is_approved = 1) OR user_id = 1)
ORDER by created_at DESC
LIMIT 0, 15

The goal of the query is just to pull the checkin_id and created_at for all the users' friend plus their activity. It's a pretty simple query, but when a user's friends have tons of recent activity, this query is very quick, here is the EXPLAIN:

 id     select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
 1  PRIMARY     checkin     index   user_id,user_id_2   created_at  8   NULL    15  Using where
 2  DEPENDENT SUBQUERY friends  eq_ref    user_id,friend_id,is_approved,friend_looku...     PRIMARY     8   const,func  1   Using where

As an explanation, user_id is a simple index on user_id - while user_id_2 is an index on user_id and created_at. On the friends table, friends_lookup is the index of user_id and is_approved.

This is a very simple query and get's completed in: Showing rows 0 - 14 (15 total, Query took 0.0073 sec).

However when a user's friends activity is not very recent and there isn't a lot of the data, the same query takes around 5-7 seconds and it has the same EXPLAIN as the previous query - but takes longer.

It doesn't seem to have an affect on more friends, it seems to speed up with more recent activity.

Is there any tips that anyone have to optimize these queries to makes sure they run the same speed irregardless of activity?

Server Setup

This is a dedicated MySQL server running 16GB of RAM. It is running Ubuntu 10.10 and the version of MySQL is 5.1.49

UPDATE

So most people have suggested remove the IN piece and move them into a INNER JOIN:

SELECT c.checkin_id, c.created_at
FROM checkin c
INNER JOIN friends f ON c.user_id = f.friend_id
WHERE f.user_id =1
AND f.is_approved =1
ORDER BY c.created_at DESC
LIMIT 0 , 15

This query is 10x worse - as reported in the EXPLAIN:

 id     select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
 1  SIMPLE  f   ref     PRIMARY,user_id,friend_id,is_approved,friend_looku...   friend_lookup   5   const,const     938     Using temporary; Using filesort
 1  SIMPLE  c   ref     user_id,user_id_2   user_id     4   untappd_prod.f.friend_id    71  Using where

The goal of this query to get all the friends activity, and yours in the same query (instead of having to create two queries and merge the results together and sort by created_at). I also can't remove the index on user_id as it's important piece of another query.

The interesting part is when I run this query on a user account that doesn't have a lot activity, I get this explain:

 id     select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
 1  SIMPLE  f   index_merge     PRIMARY,user_id,friend_id,is_approved,friend_looku...    user_id,friend_lookup  4,5     NULL    11  Using intersect(user_id,friend_lookup); Using wher...
 1  SIMPLE  c   ref     user_id,user_id_2   user_id     4   untappd_prod.f.friend_id    71  Using where

Any advice?

Upvotes: 1

Views: 386

Answers (2)

381
381

Reputation: 21

so.. you have a few things going on here..

  1. in the explain plan .. usually the optimizer will choose whats in "key" and not whats in possible_keys. So thats why you experience when it needs to scan more records when the data is not recent.

  2. on checkin table only ( user_id, created_at ) and created_at is necessary.. you dont need another index for user_id.. the optimizer will use (user_id, created_at ) since user_id is the first order.

try this..

  1. use join between friends and checkin and remove the in clause, such that friends becomes the driving table and you should see that first on the execution path of your explain plan.

  2. with 1 done, you should make sure that checkin is using (user_id, created_dt ) index in the execution path.

  3. write another query for the OR condition where user_id from checkin table is 1. I think your data set should be mutually exclusive for these two sets, it should then be ok .. or else you would not need to have the OR condition after the IN clause in the first place.

  4. remove the user_id index thats by it self as you have user_id, created_at index.

-- your goal is that it uses the index under key not just possible keys.

this should take care of older non recent checkins as well as recent ones.

Upvotes: 2

bobwienholt
bobwienholt

Reputation: 17610

My first suggestion is to remove the dependent subquery and turn it into a join. I've found that MySQL is not good at processing these types of queries. Try this:

SELECT c.checkin_id, c.created_at
FROM checkin c
INNER JOIN friends f
   ON c.user_id = f.friend_id
WHERE f.user_id = 1
   AND f.is_approved = 1
ORDER by c.created_at DESC
LIMIT 0, 15

My second suggestion, since you have a dedicated server, is to use the InnoDB storage engine for all your tables. Make sure that you tweak default InnoDB settings, especially for innodb_buffer_pool_size: http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/

Upvotes: 0

Related Questions