Reputation: 514
I have SQL statement:
SELECT need.studentid, need.firstname, need.lastname, need.courseid
FROM
(SELECT student.studentid, student.firstname, student.lastname, enrollment.courseid
FROM prereq LEFT JOIN enrollment ON prereq.courseid = enrollment.courseid LEFT JOIN student ON enrollment.studentid = student.studentid) AS need
WHERE need.studentid NOT IN (SELECT studentid
FROM enrollment
WHERE NOT(grade LIKE '%F' OR grade IS NULL OR grade LIKE 'N'))
My goal is to convert this to relational algebra. I've managed to convert Left Joins and inner select, but I don't know how to connect this with second part and how to convert NOT IN to relational algebra.
Upvotes: 0
Views: 326
Reputation: 108400
If we aren't depending on the behavior we'd get if the subquery in the NOT IN (subquery)
would return a NULL value for `studentid`...
(If the subquery returns a NULL, then the NOT IN comparison will never evaluate to TRUE, for any row, so the outer query wouldn't return any rows.)
It could be there's some sort of guarantee that `studentid` won't ever be NULL, like its defined to be NOT NULL or it's the PRIMARY KEY... but we don't see that. And there isn't a studentid IS NOT NULL
condition in that subquery, so we have to anticipate the behavior that will occur if the subquery does return a NULL.
If we aren't interested in retaining that behavior,
then the NOT IN (subquery)
could be expressed as an antijoin pattern.
SELECT need.studentid
, need.firstname
, need.lastname
, need.courseid
FROM ( SELECT student.studentid
, student.firstname
, student.lastname
, enrollment.courseid
FROM prereq
LEFT
JOIN enrollment
ON prereq.courseid = enrollment.courseid
LEFT
JOIN student
ON enrollment.studentid = student.studentid
) need
LEFT
JOIN enrollment aj
ON aj.studentid = need.studentid
AND NOT ( aj.grade LIKE '%F' OR aj.grade IS NULL OR aj.grade LIKE 'N')
WHERE aj.studentid IS NULL
The anti-join pattern looks like an outer join (to return all rows from one side along with matching rows from the other), and a condition in the WHERE clause that excludes all rows that had a match.
This works because at.studentid
is guaranteed to be non-NULL for any matching rows that were found. For any row from need
that didn't have a matching row in aj
, those rows will have NULL value for the columns from aj
. (That's what the outer join basically does... invents a row of NULL values to serve as a placeholder matching row for the join operation.)
For the conditional test in the WHERE clause, all we need to do is test for NULL values in one of the columns from aj
that would be guaranteed to be non-NULL. We could reference the PRIMARY KEY of the table if we knew what that was. We can use the studentid
column because any row that satisfies the equality comparison in the join condition is guaranteed to be non-NULL.)
The SQL antijoin can be translated into relational algebra.
An equivalent result can be returned without an inline view.
SELECT student.studentid
, student.firstname
, student.lastname
, enrollment.courseid
FROM prereq
LEFT
JOIN enrollment
ON prereq.courseid = enrollment.courseid
LEFT
JOIN student
ON enrollment.studentid = student.studentid
LEFT
JOIN enrollment aj
ON aj.studentid = student.studentid
AND NOT ( aj.grade LIKE '%F' OR aj.grade IS NULL OR aj.grade LIKE 'N')
WHERE aj.studentid IS NULL
Upvotes: 1