Reputation:
I was reading a SQL book which talks about outer join and says:
the choice of which attribute from the nonpreserved side of the join to filter is important. You should choose an attribute that can only have a NULL when the row is an outer row and not otherwise (for example, not a NULL originating from the base table). For this purpose, three cases are safe to consider—a primary key column, a join column, and a column defined as NOT NULL. A primary key column cannot be NULL; therefore, a NULL in such a column can only mean that the row is an outer row. If a row has a NULL in the join column, that row is filtered out by the second phase of the join, so a NULL in such a column can only mean that it’s an outer row. And obviously, a NULL in a column that is defined as NOT NULL can only mean that the row is an outer row.
below is the code example
The Sales.Orders has a shippingid column which can be null, so if I do like this:
SELECT C.custid, C.companyname
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON C.custid = O.custid
WHERE O.shippingid IS NULL;
which produces the same result, so why the choice of which attribute to filter is important?
Upvotes: 0
Views: 68
Reputation: 272426
It could be explained with some sample data. Suppose this is the result of the LEFT JOIN query without a WHERE clause:
+----------+------------+-----------+----------+-----------+--------------+
| c.custid | c.custname | o.orderid | o.custid | o.notnull | o.shippingid |
+----------+------------+-----------+----------+-----------+--------------+
| 1 | John | 11 | 1 | 1 | 101 |
| 1 | John | 12 | 1 | 1 | 102 |
| 2 | Jack | NULL | NULL | NULL | NULL |
| 3 | Jane | 13 | 3 | 1 | NULL |
+----------+------------+-----------+----------+-----------+--------------+
Assuming that you know how LEFT JOIN works you will notice that John has 2 orders, Jack has 0 and Jane has 1. Notice that:
So if you were to find rows from the left table that do not have matching record in right table then you need to check the primary key, the join column or a not null column for NULL. Checking nullable column for NULL will simply give you false positives like in the above example where Jane has an (unshipped) order.
Upvotes: 0
Reputation: 1271181
If you are using a LEFT JOIN
to test for non-existence (instead of NOT EXISTS
), then use a JOIN
key:
SELECT C.custid, C.companyname
FROM Sales.Customers C LEFT OUTER JOIN
Sales.Orders O
ON C.custid = O.custid
WHERE O.custId IS NULL;
Because this is a JOIN
key, you know that a NULL
value means that there is no match.
That is essentially what the passage is trying to say.
Alternatively, you can use a primary key. I cannot think of a good reason for using any other column, so I would just recommend a JOIN
key or primary key. And say that for any other column, you run the risk that the the column itself might be NULL
even when the rows match.
Upvotes: 1