Database Design Relational Algebra query

Question

I have this schema:

Suppliers(sid: integer, sname: string, address: string)
Parts(pid: integer, pname: string, color: string)
Catalog(sid: integer, pid: integer, cost: real)

And this task:

Find the sids of suppliers who supply every part.

What I don't understand is why in this solution we don't work with a negation. I was tempted to put C1.pid <> P.pid instead of C1.pid = P.pid in the end. Can someone explain?

SELECT C.sid
FROM Catalog C
WHERE NOT EXISTS (SELECT P.pid
FROM Parts P
WHERE NOT EXISTS (SELECT C1.sid
FROM Catalog C1
WHERE C1.sid = C.sid
AND C1.pid = P.pid))

Allan · Accepted Answer

Let's say you have 2 parts and 1 supplier. The supplier has both parts. If you join on <>, your innermost subquery will get two rows back: one for the Catalog entry for Part #1 (because Part #1 <> Part #2 is true); and one for the Catalog entry for Part #2 (likewise).

Your reasoning isn't entirely off, but the way to do that is not to use an inequality, but rather to use an outer join and test for the missing record on the "outer" table:

SELECT c.sid
FROM   catalog c
WHERE  NOT EXISTS
          (SELECT c1.sid
           FROM   catalog c1 LEFT JOIN parts p ON c1.pid = p.pid
           WHERE  c.sid = c1.sid AND p.pid IS NULL)

Personally, I find the nested not exists to be a little confusing and needlessly complex. I would be more likely to solve this problem using count:

SELECT   c.sid
FROM     catalog c
GROUP BY c.sid
HAVING   COUNT (DISTINCT c.pid) = (SELECT COUNT (*) FROM parts)

Database Design Relational Algebra query

Answers (1)

Related Questions