SamAko
SamAko

Reputation: 3615

How to write this complex self join

I have two tables with given fields

  1. HighSchooler

    • name
    • grade
    • id
  2. Likes

    • id1
    • id2

HighSchooler holds info on a high school student and Likes is a relationship table showing who likes who in the high school. Student with id1 likes student with id2. There are both one way relationships (when John likes Lucy but Lucy doesn't like John) and two way relationships (when Albert likes Sandra and Sandra also likes Albert).

I need a query that returns two columns with the names of those in a two way relationship, i.e if A likes B and B likes A, then a sample result-set would be

name | name
A       B

I fiddle with it and came up with this query, but i don't understand it and don't think it is optimal.

SELECT DISTINCT a.name, b.name 
FROM Highschooler a, Highschooler b, Likes l1 
JOIN Likes l2 on l1.ID1=l2.ID2 
WHERE a.ID=l1.ID2 AND b.ID=l1.ID1 AND a.ID=l2.ID1 AND a.ID > b.ID;

Upvotes: 1

Views: 1029

Answers (3)

Jonathan Leffler
Jonathan Leffler

Reputation: 753525

The key self-join is between two references to the Likes table. This then needs to be joined twice to the HighSchoolers table to get the names of the two people.

Step 1 Pairs of IDs where each likes the other

SELECT l1.id1, l1.id2
  FROM Likes AS l1
  JOIN Likes AS l2
    ON l1.id1 = l2.id2 AND l1.id2 = l2.id1;

This gives the list of pairs of IDs where each likes the other.

Step 2 Pairs of IDs where each likes the other without repeats

The only snag is that it gives each pair twice. So, the trick is to note that in one of the two rows, the id1 value is smaller than the id2 value. As a possibly beneficial side-effect, this eliminates anyone who likes themself.

SELECT l1.id1, l1.id2
  FROM Likes AS l1
  JOIN Likes AS l2
    ON l1.id1 = l2.id2 AND l1.id2 = l2.id1;
 WHERE l1.id1 < l1.id2

Step 3 Pairs of Names where each likes the other

Now to tidy it up with names:

SELECT h1.name AS name1, h2.name AS name2
  FROM (SELECT l1.id1, l1.id2
          FROM Likes AS l1
          JOIN Likes AS l2
            ON l1.id1 = l2.id2 AND l1.id2 = l2.id1
         WHERE l1.id1 < l1.id2
       ) AS p
  JOIN HighSchoolers AS h1 ON p.id1 = h1.id
  JOIN HighSchoolers AS h2 ON p.id2 = h2.id

p is mnemonic for 'pairs'.

Upvotes: 1

jurgenreza
jurgenreza

Reputation: 6086

Your query is correct but it is using Cartesian product of tables which as you said is not optimal. When you write select * from a,b all the rows of a and all the rows of b are combined together to form a new table which has size(a)*size(b) rows. You are doing that with three tables so you are creating a huge table and then select from it a few rows that you want. Inner join can do this more efficiently.

SELECT
a.name AS name_a, b.name AS name_b
FROM
HighSchooler AS a
INNER JOIN Likes AS l1
ON a.id = l1.id1
INNER JOIN Likes AS l2
ON l1.id1 = l2.id2 AND l1.id2 = l2.id1 AND l1.id1 < l1.id2
INNER JOIN HighSchooler AS b
ON l1.id2 = b.id;

please see fiddle:

Upvotes: 1

Bojan Dević
Bojan Dević

Reputation: 1875

Try joining Likes table with itself using the rule (l1.id1 = l2.id2) and (l1.id2 = l2.id1)

Example:

SELECT
   a.name AS a_name,
   b.name AS b_name
FROM
   HighSchooler AS a
   INNER JOIN Likes AS l1
      ON (a.id = l1.id1)
   INNER JOIN Likes AS l2
      ON ((l1.id1 = l2.id2) AND (l1.id2 = l2.id1) AND (l1.id1 > l2.id1))
   INNER JOIN HighSchooler AS b
      ON (l2.id1 = b.id)

http://sqlfiddle.com/#!2/44a07/6

Upvotes: 3

Related Questions