neha
neha

Reputation: 45

Efficient way to select records missing in another table

I have 3 tables. Below is the structure:

Now, I want to write a query to find out students who did not enroll for any course. As I could figure out there are multiple ways to fetching this information. Could you please let me know which one of these is the most efficient and also, why. Also, if there could be any other better way of executing same, please let me know.

db2 => select distinct name from student inner join student_course on id not in (select st_id from student_course)

db2 => select name from student minus (select name from student inner join student_course on id=st_id)

db2 => select name from student where id not in (select st_id from student_course)

Thanks in advance!!

Upvotes: 3

Views: 12849

Answers (4)

wildplasser
wildplasser

Reputation: 44230

The canonical (maybe even synoptic) idiom is (IMHO) to use NOT EXISTS :

SELECT *
FROM student st
WHERE NOT EXISTS (
  SELECT *
  FROM student_course
  WHERE st.id = nx.st_id
  );

Advantages:

  • NOT EXISTS(...) is very old, and most optimisers will know how to handle it
  • , thus it will probably be present on all platforms
  • the nx. correlation name is not leaked into the outer query: the select * in the outer query will only yield fields from the student table, and not the (null) rows from the student_course table, like in the LEFT JOIN ... WHERE ... IS NULL case. This is especially useful in queries with a large number of range table entries.
  • (NOT) IN is error prone (NULLs), and it might perform bad on some implementations (duplicates and NULLs have to be removed from the result of the uncorrelated subquery)

Upvotes: 1

avidD
avidD

Reputation: 451

Just as a comment: I would suggest to select student Id (which are unique) and not names.

As another query option you might want to join the two tables, group by student_id, count(course_id) having count(course_id) = 0.

Also, I agree that indexes will be more important.

Upvotes: 0

Tomas
Tomas

Reputation: 59435

The subqueries you use, whether it is not in, minus or whatever, are generally inefficient. Common way to do this is left join:

select name 
from student 
left join student_course on id = st_id
where st_id is NULL

Using join is "normal" and preffered solution.

Upvotes: 9

Dan Bracuk
Dan Bracuk

Reputation: 20794

Using "not in" is generally slow. That makes your second query the most efficient. You probably don't need the brackets though.

Upvotes: 0

Related Questions