Andrew Willis
Andrew Willis

Reputation: 2349

MySQL Join Query - joining tables into themselves many times

I have 4 queries I need to excecute in order to suggest items to users based on items they've already expressed an interest in:

Select 5 random items the user already likes

SELECT item_id
FROM user_items
WHERE user_id = :user_person
ORDER BY RAND()
LIMIT 5

Select 50 people who like the same items

SELECT user_id 
FROM user_items
WHERE user_id != :user_person
AND item_id = :selected_item_list
LIMIT 50

SELECT all items that the original user likes

SELECT item_id
FROM user_items
WHERE user_id = :user_person

SELECT 5 items the user doesn't already like to suggest to the user

SELECT item_id
FROM user_items
WHERE user_id = :user_id_list
AND item_id != :item_id_list
LIMIT 5

What I would like to know is how would I excecute this as one query?

There are a few reasons for me wanting to do this:

  1. at the moment, I have to excecute the 'select 50 people' query 5 times and pick the top 50 people from it
  2. I then have to excecute the 'select 5 items' query 50 * (number of items initial user likes)

Once the query has been excecuted, I intend to store the query result in a cookie (if the user gives consent to me using cookies, otherwise they don't get the 'item suggestion' at all) with the key being a hash of the query, meaning it will only fire once a day / once a week (that's why I return 5 suggestions and select a key at random to display)

Basically, if anybody knows how to write these queries as one query, could you show me and explain what is going on in the query?

Upvotes: 1

Views: 172

Answers (1)

scriptin
scriptin

Reputation: 3120

This will select all items you need:

SELECT DISTINCT ui_items.item_id
FROM user_items AS ui_own
JOIN user_items AS ui_others ON ui_own.item_id = ui_others.item_id
JOIN user_items AS ui_items ON ui_others.user_id = ui_items.user_id
WHERE ui_own.user_id = :user_person 
  AND ui_others.user_id <> :user_person
  AND ui_items.item_id <> ui_own.item_id

(please, check if result are exact same with you version - I tested it on a very small fake data set)

Next you just cache this list and show 5 items randomly, because ORDER BY RAND() is VERY inefficient (non-deterministic query => no caching)

EDIT: Added the DISTINCT to not show duplicate rows.

You can also return a most popular suggestions in descending popularity order by removing DISTINCT and adding the following code to the end of the query:

GROUP BY ui_items.item_id
ORDER BY COUNT(*) DESC
LIMIT 20

To the end of the query which will return the 20 most popular items.

Upvotes: 2

Related Questions