Reputation: 413826
I've got a vague, possibly cargo-cult memory from years of working with SQL Server that when you've got a possibly-null column, it's not safe to write "WHERE" clause predicates like:
... WHERE the_column IS NULL OR the_column < 10 ...
It had something to do with the fact that SQL rules don't stipulate short-circuiting (and in fact that's kind-of a bad idea possibly for query optimization reasons), and thus the "<" comparison (or whatever) could be evaluated even if the column value is null. Now, exactly why that'd be a terrible thing, I don't know, but I recall being sternly warned by some documentation to always code that as a "CASE" clause:
... WHERE 1 = CASE WHEN the_column IS NULL THEN 1 WHEN the_column < 10 THEN 1 ELSE 0 END ...
(the goofy "1 = " part is because SQL Server doesn't/didn't have first-class booleans, or at least I thought it didn't.)
So my questions here are:
My grounding in SQL is pretty weak.
Upvotes: 10
Views: 7280
Reputation: 199
Another example where CASE is useful is when using date functions on the varchar columns. adding ISDATE before using say convert(colA,datetime) might not work, and when colA has non-date data the query can error out.
Upvotes: 0
Reputation: 434785
I don't know SQL Server so I can't speak to that.
Given an expression a L b
for some logical operator L
, there is no guarantee that a
will be evaluated before or after b
or even that both a
and b
will be evaluated:
The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.
Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all.
[...]
Note that this is not the same as the left-to-right "short-circuiting" of Boolean operators that is found in some programming languages.As a consequence, it is unwise to use functions with side effects as part of complex expressions. It is particularly dangerous to rely on side effects or evaluation order in
WHERE
andHAVING
clauses, since those clauses are extensively reprocessed as part of developing an execution plan.
As far as an expression of the form:
the_column IS NULL OR the_column < 10
is concerned, there's nothing to worry about since NULL < n
is NULL
for all n
, even NULL < NULL
evaluates to NULL
; furthermore, NULL
isn't true so
null is null or null < 10
is just a complicated way of saying true or null
and that's true
regardless of which sub-expression is evaluated first.
The whole "use a CASE" sounds mostly like cargo-cult SQL to me. However, like most cargo-cultism, there is a kernel a truth buried under the cargo; just below my first excerpt from the PostgreSQL manual, you will find this:
When it is essential to force evaluation order, a
CASE
construct (see Section 9.16) can be used. For example, this is an untrustworthy way of trying to avoid division by zero in aWHERE
clause:SELECT ... WHERE x > 0 AND y/x > 1.5;
But this is safe:
SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END;
So, if you need to guard against a condition that will raise an exception or have other side effects, then you should use a CASE
to control the order of evaluation as a CASE
is evaluated in order:
Each condition is an expression that returns a
boolean
result. If the condition's result is true, the value of theCASE
expression is the result that follows the condition, and the remainder of theCASE
expression is not processed. If the condition's result is not true, any subsequent WHEN clauses are examined in the same manner.
So given this:
case when A then Ra
when B then Rb
when C then Rc
...
A
is guaranteed to be evaluated before B
, B
before C
, etc. and evaluation stops as soon as one of the conditions evaluates to a true value.
In summary, a CASE
short-circuits buts neither AND
nor OR
short-circuit so you only need to use a CASE
when you need to protect against side effects.
Upvotes: 11
Reputation: 29
Instead of
the_column IS NULL OR the_column < 10
I'd do
isnull(the_column,0) < 10
or for the first example
WHERE 1 = CASE WHEN isnull(the_column,0) < 10 THEN 1 ELSE 0 END ...
Upvotes: 2
Reputation: 10780
Nulls can be confusing. The " ... WHERE 1 = CASE ... " is useful if you are trying to pass a Null OR a Value as a parameter ex. "WHERE the_column = @parameter. This post may be helpful Passing Null using OLEDB .
Upvotes: 0
Reputation: 30237
Well, I've repeatedly written queries like the first example since about forever (heck, I've written query generators that generate queries like that), and I've never had a problem.
I think you may be remembering some admonishment somebody gave you sometime against writing funky join conditions that use OR
. In your first example, the conditions joined by the OR
restrict the same one column of the same table, which is OK. If your second condition was a join condition (i.e., it restricted columns from two different tables), then you could get into bad situations where the query planner just has no choice but to use a Cartesian join (bad, bad, bad!!!).
I don't think your CASE function is really doing anything there, except perhaps hamper your query planner's attempts at finding a good execution plan for the query.
But more generally, just write the straightforward query first and see how it performs for realistic data. No need to worry about a problem that might not even exist!
Upvotes: 1
Reputation: 183466
I've never heard of such a problem, and this bit of SQL Server 2000 documentation uses WHERE advance < $5000 OR advance IS NULL
in an example, so it must not have been a very stern rule. My only concern with OR
is that it has lower precedence than AND
, so you might accidentally write something like WHERE the_column IS NULL OR the_column < 10 AND the_other_column > 20
when that's not what you mean; but the usual solution is parentheses rather than a big CASE
expression.
I think that in most RDBMSes, indices don't include null values, so an index on the_column
wouldn't be terribly useful for this query; but even if that weren't the case, I don't see why a big CASE
expression would be any more index-friendly.
(Of course, it's hard to prove a negative, and maybe someone else will know what you're referring to?)
Upvotes: 1