omlfc be
omlfc be

Reputation: 147

Difference between these sql queries

I can't seem to understand why these two queries return different results for the following task: "Find names and grades of students who only have friends in the same grade. Return the result sorted by grade, then by name within each grade."

Tables here: https://lagunita.stanford.edu/c4x/DB/SQL/asset/socialdata.html

The first query:

SELECT DISTINCT h1.name, h1.grade
FROM Highschooler h1, Friend f, Highschooler h2
WHERE h1.ID = f.ID1 AND h2.ID = f.ID2 AND h1.grade = h2.grade
ORDER BY h1.grade, h1.name

The second query:

select name, grade from Highschooler
where ID not in (
    select ID1 from Highschooler H1, Friend, Highschooler H2
    where H1.ID = Friend.ID1 and Friend.ID2 = H2.ID and H1.grade <> H2.grade)
order by grade, name;

The second one returns the expected result, but not the first one. If anyone cares to clarify, Thanks.

Upvotes: 1

Views: 104

Answers (3)

Jaime Caffarel
Jaime Caffarel

Reputation: 2469

I just wanted to add further clarification on the first query explanations. The first query results in this:

SELECT DISTINCT h1.name, h1.grade FROM Highschooler h1, Friend f, Highschooler h2 WHERE h1.ID = f.ID1 AND h2.ID = f.ID2 AND h1.grade = h2.grade ORDER BY h1.grade, h1.name;
    +-----------+-------+
    | name      | grade |
    +-----------+-------+
    | Cassandra |     9 |
    | Gabriel   |     9 |
    | Jordan    |     9 |
    | Tiffany   |     9 |
    | Andrew    |    10 |
    | Brittany  |    10 |
    | Haley     |    10 |
    | Kris      |    10 |
    | Alexis    |    11 |
    | Gabriel   |    11 |
    | Jessica   |    11 |
    | John      |    12 |
    | Jordan    |    12 |
    | Kyle      |    12 |
    | Logan     |    12 |
    +-----------+-------+
    15 rows in set (0,00 sec)

Since you are performing a cartesian product (by means of selecting the same table Highschooler twice), and one of your conditions is h1.grade = h2.grade, you are going to retrieve all students that have at least one friend in the same grade. The only student you are not getting is Austin, which is the only one that doesn't have any friends in his grade.

The second query is explained in Radek's answer.

I hope this helps.

Upvotes: 1

Serg
Serg

Reputation: 22811

It can be standard NULL - related behavior . Demo

create table tble (ID int, col int);
insert tble(ID, col) 
values (1,1),(2,null),(3,2);

select *
from tble 
where col=1;

select *
from tble 
where ID not in (select t2.ID from tble t2 where t2.col<>1);

Because select t2.ID from tble t2 where t2.col<>1 must not return ID 2 as predicate NULL <> 1 does not evaluates to TRUE.

Upvotes: 1

Radek Hofman
Radek Hofman

Reputation: 527

The first query applies three filter in the query simultaneously to all data in tables and returns just those entries matching all the filters. The second query firstly does a subquery where it returns rows matching the subquery condition and then all the IDs which are not there are returned, which includes also IDs for which H1.ID = Friend.ID1 and Friend.ID2 = H2.ID do not hold true. You can try something like:

select name, grade from Highschooler
where where H1.ID = Friend.ID1 and Friend.ID2 = H2.ID  and ID not in (
    select ID1 from Highschooler H1, Friend, Highschooler H2
    where H1.ID = Friend.ID1 and Friend.ID2 = H2.ID and H1.grade <> H2.grade)
order by grade, name;

Upvotes: 1

Related Questions