henryzo
henryzo

Reputation: 105

postgresql join confusion

I'm trying to make this statement work. but I can't figure it out.

  1. List the names of the students who never took the course Databases.

I have this:

select distinct s1.name, e1.section_id
from students s1
    inner join enrollment e1 on e1.student_id = s1.id
where e1.course_id != 12

but that doesn't remove the student who took that section, so I'm stuck.

Database looks like (I'm soo sorry, I don't know how to insert the database into here)

table students (
    id              integer primary key,
    name            varchar(255),
    graduation_date date,
    major_id        integer references departments(id)
);

insert into students (id, name, graduation_date, major_id) values
    (1, 'Joe', null, 10);
insert into students (id, name, graduation_date, major_id) values
    (2, 'Amy', '2009-04-22', 20);
insert into students (id, name, graduation_date, major_id) values
    (3, 'Max', null, 10);


create table courses (
    id              integer primary key,
    title           varchar(255),
    units           integer,
    department_id   integer references departments(id)
);

insert into courses (id, title, units, department_id) values
    (12, 'Databases', 4, 10);
insert into courses (id, title, units, department_id) values
    (22, 'Compilers', 4, 10);
insert into courses (id, title, units, department_id) values
    (32, 'Calculus 1', 4, 20);

create table sections (
    id              integer primary key,
    course_id       integer not null references courses(id),
    instructor_id   integer references faculty(id),
    year            integer
);

insert into sections (id, course_id, instructor_id, year) values
    (12, 12, 6, 2007);
insert into sections (id, course_id, instructor_id, year) values
    (13, 12, 1, 2008);
insert into sections (id, course_id, instructor_id, year) values
    (14, 22, 1, 2008);
insert into sections (id, course_id, instructor_id, year) values
    (23, 12, 6, 2009);

create table enrollment (
    id          integer primary key,
    student_id  integer not null references students(id),
    section_id  integer not null references sections(id),
    grade_id    integer references grades(id)
);

insert into enrollment (id, student_id, section_id, grade_id) values
    (14, 1, 12, 8);
insert into enrollment (id, student_id, section_id, grade_id) values
    (15, 1, 13, 3);
insert into enrollment (id, student_id, section_id, grade_id) values
    (16, 1, 14, 5);
insert into enrollment (id, student_id, section_id, grade_id) values
    (17, 1, 32, 1);
insert into enrollment (id, student_id, section_id, grade_id) values
    (18, 1, 34, 2);
insert into enrollment (id, student_id, section_id, grade_id) values
    (19, 1, 53, 13);
insert into enrollment (id, student_id, section_id, grade_id) values
    (24, 3, 12, 2);
insert into enrollment (id, student_id, section_id, grade_id) values
    (25, 3, 14, 5);
insert into enrollment (id, student_id, section_id, grade_id) values
    (26, 3, 32, 1);
insert into enrollment (id, student_id, section_id, grade_id) values
    (27, 3, 34, 2);
insert into enrollment (id, student_id, section_id, grade_id) values
    (28, 3, 54, 7);
insert into enrollment (id, student_id, section_id, grade_id) values
    (34, 2, 43, 3);
insert into enrollment (id, student_id, section_id, grade_id) values

Upvotes: 2

Views: 69

Answers (2)

Jakub Kania
Jakub Kania

Reputation: 16487

I think the queries above over complicate the topic a bit so I added my own.

SELECT s.*
FROM students s
   LEFT JOIN (enrollment e
       INNER JOIN sections se
         ON se.id = e.section_id
       INNER JOIN courses c
         ON c.id = se.course_id AND c.title = 'Databases')
     ON s.id = e.student_id
WHERE
  e.id IS NULL

Upvotes: 1

Thorsten Kettner
Thorsten Kettner

Reputation: 95101

No, don't join everything and then try to clean up with DISTINCT. That's a bad approach. Instead think of what you want to select first. Then write the query step by step.

The "students who never took the course Databases" are

  • all students except those who took the course Databases
  • all students who are not in the set of students who took the course Databases
  • all students for whom not exists a Databases course enrolement

I have highlighted the keywords needed. So you have three options:

  • Write a query with EXCEPT
  • Write a query with NOT IN
  • Write a query with NOT EXISTS

Try these and come back here if you have further problems.

UPDATE: Now that you solved it (and even accepted my answer :-), here are some ways to write the query:

Query with IN clause:

select name 
from students 
where id not in 
(
  select student_id 
  from enrollment 
  where section_id in 
  (
    select id 
    from sections 
    where course_id = (select id from courses where title = 'Databases')
  )
); 

Query with EXISTS clause:

select name 
from students 
where not exists
(
  select * 
  from enrollment 
  where section_id in 
  (
    select id 
    from sections 
    where course_id = (select id from courses where title = 'Databases')
  )
  and student_id = students.id
); 

Query with EXCEPT (which is not so good a solution here, because it queries the students table twice, but sometimes EXCEPT is the straight-forward way to a problem). I am using a join on a subquery here instead of WHERE students.id IN (...), just in order to show the technique.

select name 
from students 
join
(
  select id
  from students
  except
  select student_id 
  from enrollment 
  where section_id in 
  (
    select id 
    from sections 
    where course_id = (select id from courses where title = 'Databases')
  )
) found_students on found_students.id = students.id; 

Query with COUNT and HAVING, which looks quite compact. It is however a bit more prone to errors. One thing is not to confuse ON and WHERE in outer joins, another is to count the correct column. We must make sure to count a non-nullable field of table sections, so we are sure none of the student's enrollments matched an actual Databases section.

select s.id, s.name
from students s
left join enrollment e on e.student_id = s.id
left join sections s on s.id = e.section_id 
                     and s.course_id = (select id from courses where title = 'Databases')
group by s.id, s.name
having count(s.id) = 0;

Upvotes: 2

Related Questions