Reputation: 3739
This is sourced from the Stanford Coursera self study DB class SQL quizzes:
Students at your hometown high school have decided to organize their social network using databases. So far, they have collected information about sixteen students in four grades, 9-12. Here's the schema:
Highschooler (ID, name, grade)
English: There is a high school student with unique ID and a given first name in a certain grade.
Friend (ID1, ID2)
English: The student with ID1 is friends with the student with ID2. Friendship is mutual, so if (123, 456) is in the Friend table, so is (456, 123).
Likes (ID1, ID2)
English: The student with ID1 likes the student with ID2. Liking someone is not necessarily mutual, so if (123, 456) is in the Likes table, there is no guarantee that (456, 123) is also present.
Database here
Prompt: For each student A who likes a student B where the two are not friends, find if they have a friend C in common (who can introduce them!). For all such trios, return the name and grade of A, B, and C.
I have a query which generates the correct results, and one which generates incorrect results, but I cannot logically figure out why the below query is incorrect. I have bolded the problem causing line, but am having trouble thinking through it and would appreciate a brief code review.
Working query due to the clause and b.id not in (select id2 from friend where a.id=id1)
:
select distinct a.name, a.grade, b.name, b.grade, c.name, c.grade
from highschooler a, highschooler b, highschooler c, likes, friend
where a.id=likes.id1
and b.id=likes.id2
and a.id=friend.id1
-- this next line
and b.id not in (select id2 from friend where a.id=id1)
and b.id in (select id1 from friend where id2=c.id)
and a.id in (select id1 from friend where id2=c.id)
Incorrect query due to the clause and b.id <>friend.id2
:
select distinct a.name, a.grade, b.name, b.grade, c.name, c.grade
from highschooler a, highschooler b, highschooler c, likes, friend
where a.id=likes.id1
and b.id=likes.id2
and a.id=friend.id1
-- as opposed to this one
and b.id <>friend.id2
and b.id in (select id1 from friend where id2=c.id)
and a.id in (select id1 from friend where id2=c.id)
Query 1 result:
Andrew 10 Cassandra 9 Gabriel 9
Austin 11 Jordan 12 Andrew 10
Austin 11 Jordan 12 Kyle 12
Query 2 result:
Andrew 10 Cassandra 9 Gabriel 9
Brittany 10 Kris 10 Haley 10
Austin 11 Jordan 12 Andrew 10
Austin 11 Jordan 12 Kyle 12
Gabriel 11 Alexis 11 Jessica 11
Upvotes: 3
Views: 495
Reputation: 124646
These 2 conditions have very different meanings:
--1 and b.id not in (select id2 from friend where a.id=id1) --2 and b.id <> friend.id2
Let's say that b.id
is jack
, and select id2 from friend where a.id=id1
is:
id2
---
alice
bob
In the first query, for the row of jack
in the table b
, you have effectively the condition "jack is not one of (alice, bob)". Which yields true, so the row will be included in the output.
In the second query, for the row of jack
in the table b
, you're effectively trying to join with the friend
table where id2
is not jack
. In our example there are 2 such records, one with alice
and one with bob
. So in the output you will get 2 records for jack
, one for alice
and one for bob
.
In the first query you either get the record with jack
from b
or not. In the second query you get n
records for jack
, where n
is the number of records where id2
is not jack
. If all records are jack
then you'll get no records from b
. Completely different things.
Upvotes: 2
Reputation: 2282
Your issues seem to come primarily from using ANSI-89 join
s instead of using modern explicit join
s. If you used modern join
s it would be much easier to troubleshoot.
Your first version is making what is now known as an inner join
while your second version is looking for a "not equal". That's why you get different results.
Upvotes: -1