Reputation: 147
I can't seem to understand why these two queries return different results for the following task: "Find names and grades of students who only have friends in the same grade. Return the result sorted by grade, then by name within each grade."
Tables here: https://lagunita.stanford.edu/c4x/DB/SQL/asset/socialdata.html
The first query:
SELECT DISTINCT h1.name, h1.grade
FROM Highschooler h1, Friend f, Highschooler h2
WHERE h1.ID = f.ID1 AND h2.ID = f.ID2 AND h1.grade = h2.grade
ORDER BY h1.grade, h1.name
The second query:
select name, grade from Highschooler
where ID not in (
select ID1 from Highschooler H1, Friend, Highschooler H2
where H1.ID = Friend.ID1 and Friend.ID2 = H2.ID and H1.grade <> H2.grade)
order by grade, name;
The second one returns the expected result, but not the first one. If anyone cares to clarify, Thanks.
Upvotes: 1
Views: 104
Reputation: 2469
I just wanted to add further clarification on the first query explanations. The first query results in this:
SELECT DISTINCT h1.name, h1.grade FROM Highschooler h1, Friend f, Highschooler h2 WHERE h1.ID = f.ID1 AND h2.ID = f.ID2 AND h1.grade = h2.grade ORDER BY h1.grade, h1.name;
+-----------+-------+
| name | grade |
+-----------+-------+
| Cassandra | 9 |
| Gabriel | 9 |
| Jordan | 9 |
| Tiffany | 9 |
| Andrew | 10 |
| Brittany | 10 |
| Haley | 10 |
| Kris | 10 |
| Alexis | 11 |
| Gabriel | 11 |
| Jessica | 11 |
| John | 12 |
| Jordan | 12 |
| Kyle | 12 |
| Logan | 12 |
+-----------+-------+
15 rows in set (0,00 sec)
Since you are performing a cartesian product (by means of selecting the same table Highschooler
twice), and one of your conditions is h1.grade = h2.grade
, you are going to retrieve all students that have at least one friend in the same grade. The only student you are not getting is Austin
, which is the only one that doesn't have any friends in his grade.
The second query is explained in Radek's answer.
I hope this helps.
Upvotes: 1
Reputation: 22811
It can be standard NULL - related behavior . Demo
create table tble (ID int, col int);
insert tble(ID, col)
values (1,1),(2,null),(3,2);
select *
from tble
where col=1;
select *
from tble
where ID not in (select t2.ID from tble t2 where t2.col<>1);
Because select t2.ID from tble t2 where t2.col<>1
must not return ID 2 as predicate NULL <> 1
does not evaluates to TRUE.
Upvotes: 1
Reputation: 527
The first query applies three filter in the query simultaneously to all data in tables and returns just those entries matching all the filters. The second query firstly does a subquery where it returns rows matching the subquery condition and then all the IDs which are not there are returned, which includes also IDs for which H1.ID = Friend.ID1
and Friend.ID2 = H2.ID
do not hold true. You can try something like:
select name, grade from Highschooler
where where H1.ID = Friend.ID1 and Friend.ID2 = H2.ID and ID not in (
select ID1 from Highschooler H1, Friend, Highschooler H2
where H1.ID = Friend.ID1 and Friend.ID2 = H2.ID and H1.grade <> H2.grade)
order by grade, name;
Upvotes: 1