Pointy
Pointy

Reputation: 413826

SQL and logical operators and null checks

I've got a vague, possibly cargo-cult memory from years of working with SQL Server that when you've got a possibly-null column, it's not safe to write "WHERE" clause predicates like:

 ... WHERE the_column IS NULL OR the_column < 10 ...

It had something to do with the fact that SQL rules don't stipulate short-circuiting (and in fact that's kind-of a bad idea possibly for query optimization reasons), and thus the "<" comparison (or whatever) could be evaluated even if the column value is null. Now, exactly why that'd be a terrible thing, I don't know, but I recall being sternly warned by some documentation to always code that as a "CASE" clause:

 ... WHERE 1 = CASE WHEN the_column IS NULL THEN 1 WHEN the_column < 10 THEN 1 ELSE 0 END ...

(the goofy "1 = " part is because SQL Server doesn't/didn't have first-class booleans, or at least I thought it didn't.)

So my questions here are:

  1. Is that really true for SQL Server (or perhaps back-rev SQL Server 2000 or 2005) or am I just nuts?
  2. If so, does the same caveat apply to PostgreSQL? (8.4 if it matters)
  3. What exactly is the issue? Does it have to do with how indexes work or something?

My grounding in SQL is pretty weak.

Upvotes: 10

Views: 7280

Answers (6)

nee21
nee21

Reputation: 199

Another example where CASE is useful is when using date functions on the varchar columns. adding ISDATE before using say convert(colA,datetime) might not work, and when colA has non-date data the query can error out.

Upvotes: 0

mu is too short
mu is too short

Reputation: 434785

I don't know SQL Server so I can't speak to that.

Given an expression a L b for some logical operator L, there is no guarantee that a will be evaluated before or after b or even that both a and b will be evaluated:

Expression Evaluation Rules

The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.

Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all.
[...]
Note that this is not the same as the left-to-right "short-circuiting" of Boolean operators that is found in some programming languages.

As a consequence, it is unwise to use functions with side effects as part of complex expressions. It is particularly dangerous to rely on side effects or evaluation order in WHERE and HAVING clauses, since those clauses are extensively reprocessed as part of developing an execution plan.

As far as an expression of the form:

the_column IS NULL OR the_column < 10

is concerned, there's nothing to worry about since NULL < n is NULL for all n, even NULL < NULL evaluates to NULL; furthermore, NULL isn't true so

null is null or null < 10

is just a complicated way of saying true or null and that's true regardless of which sub-expression is evaluated first.

The whole "use a CASE" sounds mostly like cargo-cult SQL to me. However, like most cargo-cultism, there is a kernel a truth buried under the cargo; just below my first excerpt from the PostgreSQL manual, you will find this:

When it is essential to force evaluation order, a CASE construct (see Section 9.16) can be used. For example, this is an untrustworthy way of trying to avoid division by zero in a WHERE clause:

SELECT ... WHERE x > 0 AND y/x > 1.5;

But this is safe:

SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END;

So, if you need to guard against a condition that will raise an exception or have other side effects, then you should use a CASE to control the order of evaluation as a CASE is evaluated in order:

Each condition is an expression that returns a boolean result. If the condition's result is true, the value of the CASE expression is the result that follows the condition, and the remainder of the CASE expression is not processed. If the condition's result is not true, any subsequent WHEN clauses are examined in the same manner.

So given this:

case when A then Ra
     when B then Rb
     when C then Rc
     ...

A is guaranteed to be evaluated before B, B before C, etc. and evaluation stops as soon as one of the conditions evaluates to a true value.

In summary, a CASE short-circuits buts neither AND nor OR short-circuit so you only need to use a CASE when you need to protect against side effects.

Upvotes: 11

L J
L J

Reputation: 29

Instead of

the_column IS NULL OR the_column < 10

I'd do

isnull(the_column,0) < 10

or for the first example

WHERE 1 = CASE WHEN isnull(the_column,0) < 10 THEN 1 ELSE 0 END ...

Upvotes: 2

ron tornambe
ron tornambe

Reputation: 10780

Nulls can be confusing. The " ... WHERE 1 = CASE ... " is useful if you are trying to pass a Null OR a Value as a parameter ex. "WHERE the_column = @parameter. This post may be helpful Passing Null using OLEDB .

Upvotes: 0

Luis Casillas
Luis Casillas

Reputation: 30237

Well, I've repeatedly written queries like the first example since about forever (heck, I've written query generators that generate queries like that), and I've never had a problem.

I think you may be remembering some admonishment somebody gave you sometime against writing funky join conditions that use OR. In your first example, the conditions joined by the OR restrict the same one column of the same table, which is OK. If your second condition was a join condition (i.e., it restricted columns from two different tables), then you could get into bad situations where the query planner just has no choice but to use a Cartesian join (bad, bad, bad!!!).

I don't think your CASE function is really doing anything there, except perhaps hamper your query planner's attempts at finding a good execution plan for the query.

But more generally, just write the straightforward query first and see how it performs for realistic data. No need to worry about a problem that might not even exist!

Upvotes: 1

ruakh
ruakh

Reputation: 183466

I've never heard of such a problem, and this bit of SQL Server 2000 documentation uses WHERE advance < $5000 OR advance IS NULL in an example, so it must not have been a very stern rule. My only concern with OR is that it has lower precedence than AND, so you might accidentally write something like WHERE the_column IS NULL OR the_column < 10 AND the_other_column > 20 when that's not what you mean; but the usual solution is parentheses rather than a big CASE expression.

I think that in most RDBMSes, indices don't include null values, so an index on the_column wouldn't be terribly useful for this query; but even if that weren't the case, I don't see why a big CASE expression would be any more index-friendly.

(Of course, it's hard to prove a negative, and maybe someone else will know what you're referring to?)

Upvotes: 1

Related Questions