Difference between social network queries

Question

This is sourced from the Stanford Coursera self study DB class SQL quizzes:

Students at your hometown high school have decided to organize their social network using databases. So far, they have collected information about sixteen students in four grades, 9-12. Here's the schema:
Highschooler (ID, name, grade) 
English: There is a high school student with unique ID and a given first name in a certain grade.
Friend (ID1, ID2) 
English: The student with ID1 is friends with the student with ID2. Friendship is mutual, so if (123, 456) is in the Friend table, so is (456, 123).
Likes (ID1, ID2) 
English: The student with ID1 likes the student with ID2. Liking someone is not necessarily mutual, so if (123, 456) is in the Likes table, there is no guarantee that (456, 123) is also present.

Database here

Prompt: For each student A who likes a student B where the two are not friends, find if they have a friend C in common (who can introduce them!). For all such trios, return the name and grade of A, B, and C.

I have a query which generates the correct results, and one which generates incorrect results, but I cannot logically figure out why the below query is incorrect. I have bolded the problem causing line, but am having trouble thinking through it and would appreciate a brief code review.

Working query due to the clause and b.id not in (select id2 from friend where a.id=id1):

select distinct a.name, a.grade, b.name, b.grade, c.name, c.grade
from highschooler a, highschooler b, highschooler c, likes, friend
where a.id=likes.id1
and b.id=likes.id2
and a.id=friend.id1
-- this next line
and b.id not in (select id2 from friend where a.id=id1)
and b.id in (select id1 from friend where id2=c.id)
and a.id in (select id1 from friend where id2=c.id)

Incorrect query due to the clause and b.id <>friend.id2:

select distinct a.name, a.grade, b.name, b.grade, c.name, c.grade
from highschooler a, highschooler b, highschooler c, likes, friend
where a.id=likes.id1
and b.id=likes.id2
and a.id=friend.id1
-- as opposed to this one
and b.id <>friend.id2
and b.id in (select id1 from friend where id2=c.id)
and a.id in (select id1 from friend where id2=c.id)

Query 1 result:

Andrew 10 Cassandra 9 Gabriel 9

Austin 11 Jordan 12 Andrew 10

Austin 11 Jordan 12 Kyle 12

Query 2 result:

Andrew 10 Cassandra 9 Gabriel 9

Brittany 10 Kris 10 Haley 10

Austin 11 Jordan 12 Andrew 10

Austin 11 Jordan 12 Kyle 12

Gabriel 11 Alexis 11 Jessica 11

janos · Accepted Answer

These 2 conditions have very different meanings:

--1
and b.id not in (select id2 from friend where a.id=id1)
--2
and b.id <> friend.id2

Let's say that b.id is jack, and select id2 from friend where a.id=id1 is:

id2
---
alice
bob

In the first query, for the row of jack in the table b, you have effectively the condition "jack is not one of (alice, bob)". Which yields true, so the row will be included in the output.

In the second query, for the row of jack in the table b, you're effectively trying to join with the friend table where id2 is not jack. In our example there are 2 such records, one with alice and one with bob. So in the output you will get 2 records for jack, one for alice and one for bob.

In the first query you either get the record with jack from b or not. In the second query you get n records for jack, where n is the number of records where id2 is not jack. If all records are jack then you'll get no records from b. Completely different things.

Difference between social network queries

Answers (2)

Related Questions