user11224591
user11224591

Reputation:

choice of which attribute from the nonpreserved side of the join to filter

I was reading a SQL book which talks about outer join and says:

the choice of which attribute from the nonpreserved side of the join to filter is important. You should choose an attribute that can only have a NULL when the row is an outer row and not otherwise (for example, not a NULL originating from the base table). For this purpose, three cases are safe to consider—a primary key column, a join column, and a column defined as NOT NULL. A primary key column cannot be NULL; therefore, a NULL in such a column can only mean that the row is an outer row. If a row has a NULL in the join column, that row is filtered out by the second phase of the join, so a NULL in such a column can only mean that it’s an outer row. And obviously, a NULL in a column that is defined as NOT NULL can only mean that the row is an outer row. below is the code example enter image description here enter image description here

The Sales.Orders has a shippingid column which can be null, so if I do like this:

SELECT C.custid, C.companyname
FROM Sales.Customers AS C
 LEFT OUTER JOIN Sales.Orders AS O
 ON C.custid = O.custid
WHERE O.shippingid IS NULL;

which produces the same result, so why the choice of which attribute to filter is important?

Upvotes: 0

Views: 68

Answers (2)

Salman Arshad
Salman Arshad

Reputation: 272426

It could be explained with some sample data. Suppose this is the result of the LEFT JOIN query without a WHERE clause:

+----------+------------+-----------+----------+-----------+--------------+
| c.custid | c.custname | o.orderid | o.custid | o.notnull | o.shippingid |
+----------+------------+-----------+----------+-----------+--------------+
|        1 | John       |        11 |        1 |         1 |          101 |
|        1 | John       |        12 |        1 |         1 |          102 |
|        2 | Jack       |      NULL |     NULL |      NULL |         NULL |
|        3 | Jane       |        13 |        3 |         1 |         NULL |
+----------+------------+-----------+----------+-----------+--------------+

Assuming that you know how LEFT JOIN works you will notice that John has 2 orders, Jack has 0 and Jane has 1. Notice that:

  • If there is no matching order then all columns belonging to orders table are NULL
  • If there is a matching order it is still possible to have NULL in nullable columns

So if you were to find rows from the left table that do not have matching record in right table then you need to check the primary key, the join column or a not null column for NULL. Checking nullable column for NULL will simply give you false positives like in the above example where Jane has an (unshipped) order.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271181

If you are using a LEFT JOIN to test for non-existence (instead of NOT EXISTS), then use a JOIN key:

SELECT C.custid, C.companyname
FROM Sales.Customers C LEFT OUTER JOIN
     Sales.Orders O
     ON C.custid = O.custid
WHERE O.custId IS NULL;

Because this is a JOIN key, you know that a NULL value means that there is no match.

That is essentially what the passage is trying to say.

Alternatively, you can use a primary key. I cannot think of a good reason for using any other column, so I would just recommend a JOIN key or primary key. And say that for any other column, you run the risk that the the column itself might be NULL even when the rows match.

Upvotes: 1

Related Questions