Randy Minder
Randy Minder

Reputation: 48412

Nullable vs. non-null varchar data types - which is faster for queries?

We generally prefer to have all our varchar/nvarchar columns non-nullable with a empty string ('') as a default value. Someone on the team suggested that nullable is better because:

A query like this:

Select * From MyTable Where MyColumn IS NOT NULL

is faster than this:

Select * From MyTable Where MyColumn == ''

Anyone have any experience to validate whether this is true?

Upvotes: 10

Views: 5318

Answers (5)

gbn
gbn

Reputation: 432271

If you need NULL, use NULL. Ditto empty string.

As for performance, "it depends"

If you have varchar, you are storing an actual value in the row for the length. If you have char, then you store the actual length. NULL won't be stored in-row depending on the engine (NULL bitmap for SQL Server for example).

This means IS NULL is quicker, query for query, but it could add COALESCE/NULLIF/ISNULL complexity.

So, your colleague is partially correct but may not appreciate it fully.

Blindly using empty string is use of a sentinel value rather then working through the NULL semantic issue

FWIW and personally:

  • I would tend to use NULL but don't always. I like to avoid dates like 31 Dec 9999 which is where NULL avoidance leads you.

  • From Cade Roux's answer... I also find that discussions about "Is date of death nullable" pointless. For an field, in practical terms, either there is a value or there isn't.

  • Sentinel values are worse then NULLs. Magic numbers. anyone?

Upvotes: 4

Joe R.
Joe R.

Reputation: 2052

In a nutshell, NULL = UNKNOWN!.. Which means (using date of death example) that the entity could be 1)alive, 2)dead but date of death is not known, or 3)unknown if entity is dead or alive. For numeric columns I always default them to 0 (ZERO) because somewhere along the line you may have to perform aggregate calculations and NULL + 123 = NULL. For alphanumerics I use NULL since its least expensive performance-wise and easier to say '...where a IS NULL' than saying '...where a = "" '. Using '...where a = " "[space]' is not a good idea because [space] is not a NULL! For dates, if you have to leave a date column NULL, you may want to add a status indicator column which, in the above example, A=Alive, D=Dead, Q=Dead, date of death not known, N=Alive or Dead is unknown.

Upvotes: 1

John
John

Reputation: 207

Tell that guy on your team to get his prematurely optimizin' head out of his ass! (But in a nice way).

Developers like that can be poison to the team, full of low-level optimization myths, all of which may be true or have been true at one point in time for some specific vendor or query pattern, or possibly only true in theory but never true in practice. Acting upon these myths is a costly waste of time, and can destroy an otherwise good design.

He probably means well and wants to contribute his knowledge to the team. Unfortunately, he is wrong. Not wrong in the sense of whether a benchmark will prove his statement correct or incorrect. He's wrong in the sense that this is not how you design a database. The question of whether to make a field NULL-able is a question about domain of the data for the purposes of defining the type of the field. It should be answered in terms of what it means for the field to have no value.

Upvotes: 3

Cade Roux
Cade Roux

Reputation: 89671

On some platforms (and even versions), this is going to depend on how NULLs are indexed.

My basic rule of thumb for NULLs is:

  1. Don't allow NULLs until justified

  2. Don't allow NULLs unless the data can really be unknown

A good example of this is modeling address lines. If you have an AddressLine1 and AddressLine2, what does it mean for the first to have data and the second to be NULL? It seems to me, you either know the address or not, and having partial NULLs in a set of data just asks for trouble when somebody concatenates them and gets NULL (ANSI behavior). You might solve this with allowing NULLs and adding a check constraint - either all the Address information is NULL or none is.

Similar thing with middle initial/name. Some people don't have one. Is this different from it being unknown and do you care?

ALso, date of death - what does NULL mean? Not dead? Unknown date of death? Many times a single column is not sufficient to encode knowledge in a domain.

So to me, whether to allow NULLs would depend very much on the semantics of the data first - performance is going to be second, because having data misinterpreted (potentially by many different people) is usually a far more expensive problem than performance.

It might seem like a little thing (in SQL Server the implementation is a bitmask stored with the row), but only allowing NULLs after justification seems to me to work best. It catches things early in development, forces you to address assumptions and understand your problem domain.

Upvotes: 14

Mewp
Mewp

Reputation: 4715

If you want to know that there is no value, use NULL.

As for speed, IS NULL should be faster, because it doesn't use string comparison.

Upvotes: 6

Related Questions