Pantelis
Pantelis

Reputation: 6146

How to optimize custom-per-user search results

Let's assume we have the following scenario, 2 entities; users, images.
Users can like images, and also follow each other. (So we have 2 relational tables, user_likes and follow where who liked what, and who follows who are kept)

So, we (are represented by a user and) want to perform a search to fetch images that are liked by our friends and are named "cat.jpg".

In sql something like that would look like

SElECT DISTINCT(images.id) 
FROM images 
JOIN likes ON likes.image_id = images.id 
JOIN 
  (SELECT follow.following_id 
   FROM follow 
   WHERE follow.follower_id = MY_ID) as following 
 ON following.following_id = likes.user_id 
WHERE images.name = "cat.jpg"
ORDER BY images.date DESC
LIMIT 0, 20

The above query will return the 20 most recent unique ids of the images the users we are following liked and that are (the images) named "cat.jpg".

My question is... How can this procedure be optimized?

The first thought that comes to my mind is caching, but if another user searches for "cat.jpg" s/he will be served different results (because s/he will follow a different set of users). Therefore caching in this specific scenario seems costly since there can be a huge amount of possible search-keywords and a huge set of users-following-users combinations. Is it a viable solution? If that user never again searches for "cat.jpg" then caching the response would simply be a waste of memory.

Generally I 've seen people suggesting using Redis or even Memcached for storing per-user lists of updates or social feed entries, but in a search scenario something like this seems to fall short. No?

Any suggestions, tips, or links with resources discussing similar issues and approaches, are greatly appreciated!

Upvotes: 0

Views: 35

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270713

This is your query (simplified using table aliases):

SElECT DISTINCT i.id
FROM images i JOIN
     likes l
     ON l.image_id = i.id JOIN 
     (SELECT f.following_id 
      FROM follow f
      WHERE f.follower_id = MY_ID
     ) as f 
    ON f.following_id = l.user_id 
WHERE i.name = 'cat.jpg'
ORDER BY i.date DESC
LIMIT 0, 20;

How can we make it run faster? Well, first, the subquery is not needed:

SElECT DISTINCT i.id
FROM images i JOIN
     likes l
     ON l.image_id = i.id JOIN 
     follow f
     ON f.following_id = l.user_id and
        f.follower_id = MY_ID
WHERE i.name = 'cat.jpg'
ORDER BY i.date DESC
LIMIT 0, 20;

Second, the following indexes might help performance:

images(name, date);
likes(image_id, user_id);
follow(user_id, follower_id);

Upvotes: 1

Cybercartel
Cybercartel

Reputation: 12592

Yes, it is easy solution. It's much more effort to find all the combinations maybe impossible. The same problem is with shortest path in graph problem. Or AZ shortest path.

Upvotes: 0

Related Questions