Reputation: 2387
I'm trying to find users with similar set of interests, with the following schema..
USERS - ID name etc
Interests - ID UID PID
where ID is unique ID for Interests, UIS is user ID and PID is a product ID. I have looked at other similar questions at SO, but none of them had an exact answer.
Example- Let's say I'm interested in getting users with similar interest to John, and this is how to two tables look like ...
ID Name
11 John
12 Mary
13 Scott
14 Tim
ID UID PID
3 12 123
4 12 231
5 12 612
6 13 123
7 13 612
8 14 931
9 14 214
10 11 123
11 11 231
12 11 781
13 11 612
I would like a result with in that order.
I was thinking of doing a set intersection of the user I'm interested in with all other users. It doesn't sound like a very good solution, because it will have to be done everytime a user adds interest or another user is added. Its a small project, and as of now I'll be limiting users to 100. I still think that the above approach will not be efficient at all as it will take 1002 time.
Can someone guide me in the right direction? What are the possible solutions, and which one will be the best with above given constraints. I'm looking at ANN to see if I can use that.
Upvotes: 3
Views: 652
Reputation: 129
The following query finds others users with atleast 2 or more similar interests according to the interests of user 11.
SELECT in2.UID FROM users u
INNER JOIN interest in1 ON (in1.UID = u.ID)
INNER JOIN interest in2 ON (in2.PID = in1.PID AND in2.UID <> u.ID)
WHERE u.ID = 11
GROUP BY in2.UID
HAVING COUNT(in2.UID) >= 2
ORDER BY COUNT(in2.UID) DESC
The ORDER BY ensures that users with the most similar interests ends up first. The HAVING COUNT(in2.UID) >= 2) makes sure the users which are found have atleast 2 or more similar interest.
Upvotes: 0
Reputation: 1269853
This starts by counting the number of interests that each user has in common with John. The approach is to take all of John's interests, join back to the interests table and aggregate to the the count of common interests. Here is the SQL for that:
select i.uid, COUNT(*) as cnt
from (select i.*
from interests i join
users u
on i.uid = i.id
where u.name = 'John'
) ilist join
interests i
on ilist.pid = i.pid and
ilist.uid <> i.uid -- forget about John
group by i.uid
But, you actually want the list of products, rather than just the count. So, you have to join back to the interests table:
select i.*
from (select i.uid, COUNT(*) as cnt
from (select i.*
from interests i join
users u
on i.uid = i.id
where u.name = 'John'
) ilist join
interests i
on ilist.pid = i.pid and
ilist.uid <> i.uid -- forget about John
group by i.uid
) t join
interests i
on t.uid = i.uid
group by t.cnt, i.uid
Upvotes: 3