forthrin
forthrin

Reputation: 2777

When do nullable columns affect performance severely?

From what I understand, one should avoid nullable columns in databases whenever possible.

But, in what specific situations do nullable columns actually cause a significant performance drop?

In other words, when does null really hurt performance? (As opposed to when it's negligible, and does not matter at all).

I'm asking so I can know when and how it actually makes a difference.

Upvotes: 2

Views: 2833

Answers (1)

Curt
Curt

Reputation: 5722

Don't know where you heard it, but it's not true.

Nullable columns are there to represent data accurately: if a value is unknown, or not yet entered, NULL is a natural value to store. Null values are no more onerous to store or retrieve than values of any other type: most database servers store them in a single bit, which means that it will take less I/O and processor effort to retrieve a NULL value than assembling a varchar, BLOB, or text field from a bunch of fragments that may require walking through a linked list, or reading more disk blocks off the hard drive.

There are a couple instances marginally related to nullable columns that may affect performance:

  1. If you create an index on a nullable column, and the actual values in the column are sparse (i.e. many rows have a NULL value, or only a very few values are present (as, with, say a controlled vocabulary value), the b-tree data structure used to index the column becomes much less efficient. Index traversals become more expensive operations when half the values in an index are identical: you end up with an unbalanced tree.

  2. Inapt use of NULL values, or inappropriate query techniques that don't utilize NULL values as they were designed often results in poor performance, because progammers often fall back on the bad habit of searching or joining on computed column values, which ignores the fantastic set-processing ability of modern db servers. I've consulted at lots of places where the development staff has made a habit of writing clauses like:

    WHERE ISNULL(myColumn, '') = ''

which means that the DB server cannot use an index directly, and must perform a computation on every single row of that section of the execution tree to evaluate the query. That is not because there is any intrinsic inefficiency in storing, comparing, or evaluating NULL values, but because the query thwarts the strengths of the database engine to achieve a particular result.

Upvotes: 5

Related Questions