user2765602
user2765602

Reputation: 57

Suggestion SQL query

Basically what i am trying to do is to suggest people based on common interests.

I have a table of Users.

I have a table of Interested_People where UserID + InterestID is stored.

I have a table of Contactlist where people who are added with each other is stored.

What I want is to only output people who are not your friends.

I searched a lot in internet but couldn't find something like so.

Although I created a query but it is very slow. Now I Kindly request you guys if you can edit my query a bit and make it much more bandwidth & time efficient.

SELECT * 
FROM   users 
WHERE  id IN(SELECT userid 
             FROM   interested_people 
             WHERE  interested_in IN(SELECT interested_in 
                                     FROM   interested_people 
                                     WHERE  userid = [userid]) 
                    AND id NOT IN(SELECT user1 AS my_friends_userid 
                                  FROM   contactlist f 
                                  WHERE  f.user2 = [userid] 
                                    AND accepted = 1 
                                  UNION 
                                  SELECT user2 AS my_friends_userid 
                                  FROM   contactlist f 
                                  WHERE  f.user1 = [userid] 
                                    AND accepted = 1)) 
       AND id != [userid] 
ORDER  BY Rand () 
LIMIT  0, 10; 

This query actually does the job but it takes very long about 16 sec in my local machine. and that's not what I want. I want a fast and reliable one.

Thanks in advance!

Upvotes: 2

Views: 409

Answers (3)

quest4truth
quest4truth

Reputation: 1140

I think this will give you the same results but perform a lot better:

SELECT * FROM Users u
    INNER JOIN interested_people i
    ON u.id = i.userid
WHERE NOT EXISTS
    (SELECT * FROM contacts WHERE user1 = [userid] or user2 = [userid] and accepted=1)
AND id != [userid]
ORDER BY Rand()
LIMIT 0, 10

Skip the ORDER BY clause if that is at all reasonable. That will be the most expensive part

The select and join clauses give you the users who are interested in connecting and the WHERE NOT EXISTS is a performant way to exclude those contacts already listed.

Upvotes: 0

DRapp
DRapp

Reputation: 48139

First, looking at your interested-in query and assuming the "userID" you are testing with is = 1. Sounds like you are trying to get one level away from those user 1 is also interested in...

SELECT userid FROM interested_people 
   WHERE interested_in IN 
         ( SELECT interested_in FROM interested_people 
              WHERE userid = [userid] ) 

Sample Data for Interested_People

userID  Interested_In
1        5
1        7
1        8
2        3
2        5
2        7
7        1
7        2
7        5
8        3

In this case, the innermost returns interested_in values of 5, 7, 8. Then, getting all users who are interested in 5, 7 and 8 would return 2 and 7. (but since both users 2 and 7 are interested in 5, the 2 ID would be returned TWICE thus a possible duplicate join later on. I would do distinct. This same result could be done with the following query which you could sample times with...

SELECT distinct ip2.userid
   from
      interested_people ip
         join interested_people ip2
            ON ip.interested_in = ip2.interested_in
   where
      userid = [parmUserID]

Now, you need to exclude from this list all your contacts already accepted. You could then left-join TWO TIMES for the from/to contact and ensure NULL indicating not one of the contacts... Then join again to user table to get the user details.

SELECT
      u.*
   from
      users u
         JOIN 
         ( SELECT distinct 
                 ip2.userid
              from
                 interested_people ip
                    join interested_people ip2
                       ON ip.interested_in = ip2.interested_in
                       left join contactList cl1
                          ON ip2.userid = cl1.user1
                          AND cl1.accepted = 1
                       left join contactList cl2
                          ON ip2.userid = cl2.user2
                          AND cl2.accepted = 1
              where
                 ip.userid = [parmUserID]
                 AND NOT ip2.userID = [parmUserID] ) PreQuery
         ON u.id = PreQuery.userID
   order by
      RAND()
   limit
      0, 10

I would have two indexes on your contactList table to optimize both left-joins... with user1 and user2 in primary position... Similarly for the interested_people table.

table             index
contactList       ( user1, accepted )
contactList       ( user2, accepted )
interested_people ( userid, interested_in )
interested_people ( interested_in, userid )

I would expect your user table is already indexed on the ID as primary key.

Upvotes: 1

Uueerdo
Uueerdo

Reputation: 15951

Subqueries in WHERE clauses are often slow in MySQL; at least slower than comparable JOINs.

SELECT others.*
FROM interested_people AS userI
INNER JOIN interested_people AS othersI 
   ON userI.interestid = othersI.interestid 
   AND userI.userid <> othersI.userid
INNER JOIN users AS others ON othersI.user_id = others.userid
LEFT JOIN contactlist AS cl 
   ON userI.userid = cl.user1 
   AND others.userid = cl.user2
   AND cl.accepted = 1
WHERE userI.userid = [userid] 
   AND cl.accepted IS NULL
ORDER BY RAND()
LIMIT 0, 10;

Note: intuition makes me wonder if contactlist might be better as a where subquery.

The AND cl.accepted IS NULL ends up processed after the JOINs, resulting in allowing only results that did NOT have a match in contactlist.

If you want to enhance things a bit further:

SELECT others.*, COUNT(1) AS interestsCount
...
GROUP BY others.userid
ORDER BY interestsCount DESC, RAND()
LIMIT 0,10;

This would give you a random selection of the people that share the most interests in common.

Upvotes: 1

Related Questions