Reputation: 6146
Let's assume we have the following scenario, 2 entities; users, images.
Users can like images, and also follow each other. (So we have 2 relational tables, user_likes and follow where who liked what, and who follows who are kept)
So, we (are represented by a user and) want to perform a search to fetch images that are liked by our friends and are named "cat.jpg".
In sql something like that would look like
SElECT DISTINCT(images.id)
FROM images
JOIN likes ON likes.image_id = images.id
JOIN
(SELECT follow.following_id
FROM follow
WHERE follow.follower_id = MY_ID) as following
ON following.following_id = likes.user_id
WHERE images.name = "cat.jpg"
ORDER BY images.date DESC
LIMIT 0, 20
The above query will return the 20 most recent unique ids of the images the users we are following liked and that are (the images) named "cat.jpg".
My question is... How can this procedure be optimized?
The first thought that comes to my mind is caching, but if another user searches for "cat.jpg" s/he will be served different results (because s/he will follow a different set of users). Therefore caching in this specific scenario seems costly since there can be a huge amount of possible search-keywords and a huge set of users-following-users combinations. Is it a viable solution? If that user never again searches for "cat.jpg" then caching the response would simply be a waste of memory.
Generally I 've seen people suggesting using Redis or even Memcached for storing per-user lists of updates or social feed entries, but in a search scenario something like this seems to fall short. No?
Any suggestions, tips, or links with resources discussing similar issues and approaches, are greatly appreciated!
Upvotes: 0
Views: 35
Reputation: 1270713
This is your query (simplified using table aliases):
SElECT DISTINCT i.id
FROM images i JOIN
likes l
ON l.image_id = i.id JOIN
(SELECT f.following_id
FROM follow f
WHERE f.follower_id = MY_ID
) as f
ON f.following_id = l.user_id
WHERE i.name = 'cat.jpg'
ORDER BY i.date DESC
LIMIT 0, 20;
How can we make it run faster? Well, first, the subquery is not needed:
SElECT DISTINCT i.id
FROM images i JOIN
likes l
ON l.image_id = i.id JOIN
follow f
ON f.following_id = l.user_id and
f.follower_id = MY_ID
WHERE i.name = 'cat.jpg'
ORDER BY i.date DESC
LIMIT 0, 20;
Second, the following indexes might help performance:
images(name, date);
likes(image_id, user_id);
follow(user_id, follower_id);
Upvotes: 1
Reputation: 12592
Yes, it is easy solution. It's much more effort to find all the combinations maybe impossible. The same problem is with shortest path in graph problem. Or AZ shortest path.
Upvotes: 0