Reputation: 235
I have this schema:
Suppliers(sid: integer, sname: string, address: string)
Parts(pid: integer, pname: string, color: string)
Catalog(sid: integer, pid: integer, cost: real)
And this task:
Find the sids of suppliers who supply every part.
What I don't understand is why in this solution we don't work with a negation. I was tempted to put C1.pid <> P.pid
instead of C1.pid = P.pid
in the end. Can someone explain?
SELECT C.sid
FROM Catalog C
WHERE NOT EXISTS (SELECT P.pid
FROM Parts P
WHERE NOT EXISTS (SELECT C1.sid
FROM Catalog C1
WHERE C1.sid = C.sid
AND C1.pid = P.pid))
Upvotes: 1
Views: 86
Reputation: 17429
Let's say you have 2 parts and 1 supplier. The supplier has both parts. If you join on <>
, your innermost subquery will get two rows back: one for the Catalog entry for Part #1 (because Part #1 <> Part #2
is true); and one for the Catalog entry for Part #2 (likewise).
Your reasoning isn't entirely off, but the way to do that is not to use an inequality, but rather to use an outer join and test for the missing record on the "outer" table:
SELECT c.sid
FROM catalog c
WHERE NOT EXISTS
(SELECT c1.sid
FROM catalog c1 LEFT JOIN parts p ON c1.pid = p.pid
WHERE c.sid = c1.sid AND p.pid IS NULL)
Personally, I find the nested not exists
to be a little confusing and needlessly complex. I would be more likely to solve this problem using count
:
SELECT c.sid
FROM catalog c
GROUP BY c.sid
HAVING COUNT (DISTINCT c.pid) = (SELECT COUNT (*) FROM parts)
Upvotes: 2