Reputation: 251
I'm just stepping into a project and it has a fairly large database backend. I've started digging through this database and 95% of the fields are nullable.
Is this normal practice in the database world? I'm just a lowly programmer, not a DBA but I would think you would want to keep nullable fields to a minimum, only where they make sense.
Is it a "code smell" if most columns are nullable?
Upvotes: 21
Views: 6609
Reputation: 1216
As mentioned by others, front-facing data entry should allow omittance of many fields. This is complicated by how people interpret the trinary nature of NULL
(e.g. empty versus missing).
As such, I am only answering about one facet of database design: foreign keys.
In general, foreign keys do not suffer from the arbitrary nature of business logic, therefore seeing these columns allowing NULL
is definitely a code smell.
For example, if you had a [Person]
table, in no situation would you ever have a [Person].[FatherID]
value that was NULL
intentionally.
For a large database, an attempt to save NULL
to such a column is likely to occur at some point due to the inevitability of bugs, which would have been brought to light much sooner by having a NOT NULL
constraint. So for version 1 or a table, you should never allow nullable columns without justification.
But things get much trickier in an evolving code base, especially one that is staying online and thus requires migration scripting to upgrade. In particular, you may find nullable columns added to tables later on, because properly adding them as non-nullable can be quite hard depending on your integration process.
Furthermore, visual table designers (such as in SQL Server Management Studio and Visual Studio) default to allowing NULL
so it could simply be a matter of inadequate code review.
I don't want to attempt a proper answer for flag (i.e. boolean) columns, but I strongly suggest considering how they can be implemented without allowing NULL
, since I have usually found ways to avoid nullability even under the constraints of business logic.
Upvotes: 0
Reputation: 33098
To throw the opposite opinion out there. Every single field in a database should nullable. There is nothing more frustrating than working with a database that on every single insert throws an exception about required this or required that. Nothing should be required.
There is one exception to that, keys. Obviously all primary and foreign keys should be enforced to exist.
It should be the application's job to validate data and the database to simply store and retrieve what you give it. Having it process validation logic even as simple as null or not null makes a project way more complex to maintain for having different rules spread over everything.
Upvotes: -1
Reputation: 89661
In short, I would say yes, this is probably a code smell.
Whether a column is nullable or not is very important and should be determined carefully. The question should be assessed for every column. I am not a believer in a single "best practices" default for NULL
. The "best practice" for me is to address the nullability thoroughly during the design and/or refactoring of the table.
To start with, none of your primary key columns are going to be nullable. Then, I strongly lean towards NOT NULL
for anything which is a foreign key.
Some other things I consider:
Criteria where NULL
should be strongly avoided:
money
columns - is there really a possibility that this amount will be unknown?
Criteria where NULL
can be justified most frequently:
datetime
columns - there are no reserved dates, so NULL
is effectively your best option
Other data types:
char
/varchar
columns - for codes/identifiers - NOT NULL
almost exclusively
int
columns - mostly NOT NULL
unless it's something like "number of children" where you want to distinguish an unknown response.
Upvotes: 6
Reputation: 5086
I'm of the Extreme NO camp: I avoid NULLs all the time. Putting aside fundamental considerations about what they actually mean (because talk to different people, you'll get different answers such as "no value", "unknown value", "missing", "my ginger cat called Null"), the worst problem NULLs cause is that they often ruin your queries in mysterious ways.
I've lost count of the number of times I've had to debug someone's query (okay, maybe 9) and traced the problem to a join against a NULL. If your code needs ISNULL to repair joins then the chances are you've also lost index applicability and performance with it.
If you do have to store a "missing/unknown/null/cat" value (and it's something I prefer to avoid), it is better to be explicit about it.
Those skilled at NULLs may disagree. NULL use tends to split SQL crowds down the middle.
In my experience, heavy NULL use has been positively correlated with database abuse but I wouldn't carve this into stone tablets as some Law of Nature. My experience is just my experience.
EDIT: Additional thought. It is possible that those who are anti-null racists like myself are more excited by normalization than those who are pro-NULL. I don't think rabid normalizers would be too happy with ragged edges on their tables that can take NULLs. Lots of nulls may indicate that the the database developers are not into heavy normalisation. So rather than NULL suggesting code is "bad" it may alternatively suggest the philosophical position of the developers on normalisation. Maybe this is reaching. Just a thought.
Upvotes: 11
Reputation: 51917
One of the many ways to map inheritance (e.g. c# objects) to a database is to create a table for the class at the top of the hierarchy, then add the columns for all the other classes. The columns have to be nullable for when an object of a different subclass is stored in the database. This is called Single-table inheritance mapping (or Map Hierarchy To A Single Table) and is a standard design pattern.
A side effect of Single-table inheritance mapping is that most columns are nullable.
Also in Oracle an empty string (0 length) is considered to be null, therefore in some companies all strings columns are made nullable even on SqlServer. (just because the first customer wants the software on SqlServer does not mean the 2nd customer does not have a Oracle DBA that will not let SqlServer onto there network)
Upvotes: 0
Reputation: 465
In my experience, a lot nullable field in a large database like you have is very normal. Considering it perhaps is used by a lot of applications written by different people. Making columns nullable is annoying but it is perhaps the best way to keep the application robust.
Upvotes: -1
Reputation: 96572
That seems like a lot, it probably means you should at least investigate. Note that if this is mature product with a lot of data, convincing anyone to change the structure may be difficult. The earlier in the design phase you catch something like this the easier it is to fix up all the related code to adjust for the change.
Whether it is bad that they used the nulls would depend on whether the columns allowing nulls look as if they should be related tables (home phone, cell phone, business phone etc which should be in aspearate phone table) or if they look like things that might not be applicable to all records (possibly could bea related table with a one-to-one relationship)or might not be known at the time of data entry (probably ok). I would also check to see if they in fact alwAys do have a value (then you might be able to change to not null if the information is genuinely required by the busniess logic). If you have a few records with null
Upvotes: 0
Reputation: 33143
In my experience, it is a problem when Null and Not Null don't match up to the required field /not required field.
It is in the realm of possibility that those really are all optional fields. If you find in the business tier or the UI tier that those fields are required, then I think this means the data model has drifted away from the business object model and is a sign of overly conservative DB change policies, or oversight.
If you run a sample data generator on your data, and then try to load the data that is valid according to SQL, you would find out right away if the rules match up.
Upvotes: 2
Reputation: 16558
I think nullable columns should be avoided. Wherever the semantics of the domain make it possible to use a value that clearly indicates missing data, it should be used instead of NULL.
For instance, let's imagine a table that contains a Comment
field. Most developers would place a NULL here to indicate that there's no data in the column. (And, hopefully, a check constraint that disallows zero-length strings so that we have a well-known "value" to indicate the lack of a value.) My approach is usually the opposite. The Comment
column is NOT NULL
and a zero-length string indicates the lack of a value. (I use a check constraint to ensure that the zero-length string is really a zero-length string, and not whitespace.)
So, why would I do this? Two reasons:
NULL
s require special logic in SQL, and this technique avoids that.NULL
. For instance, if you use Microsoft's ADO.NET, the constant DBNull.Value
indicates a NULL, and you have to test for that. Using a zero-length string on a NOT NULL
column obviates the need.Despite all of this, there are many circumstances in which NULL
s are fine. In fact, I have no objection to their use in the scenario above, although it wouldn't be my preferred way.
Whatever you do, be kind to those who will use your tables. Be consistent. Allow them to SELECT
with confidence. Let me explain what I mean by this. I recently worked on a project whose database was not designed by me. Nearly every column was nullable and had no constraints. There was no consistency about what represented the absence of a value. It could be NULL
, a zero-length string, or even a bunch of spaces, and often was. (How that soup of values got there, I don't know.)
Imagine the ugly code a developer has to write to find all of those records with a missing Comment
field in this scenario:
SELECT * FROM Foo WHERE LEN(ISNULL(Comment, '')) = 0
Amazingly there are developers who regard this as perfectly acceptable, even normal, despite possible performance implications. Better would be:
SELECT * FROM Foo WHERE Comment IS NULL
Or
SELECT * FROM Foo WHERE Comment = ''
If your table is properly designed, the above two SQL statements can be relied upon to produce quality data.
Upvotes: 8
Reputation: 562338
Anyone who has developed a data entry application knows how common it is for some of the fields to be unknown at the time of entry -- even for columns that are business-critical, to address @Chris McCall's answer.
However, a "code smell" is merely an indicator that something might be coded in a sloppy way. You use smells to identify things that need more investigation, not necessarily things that must be changed.
So yes, if you see nullable columns so consistently, you're right to be suspicious. It might indicate that someone was being lazy, or afraid to declare NOT NULL
columns unequivocally. You can justify doing your own analysis.
Upvotes: 14
Reputation: 2956
They are a (very common) smell, I'm afraid. Look up C.J. Date writings on the topic.
Upvotes: 2
Reputation: 18984
No, whether or not a field should be nullable is a data concept and can't be a code smell. Whether or not NULLs are annoying to code has nothing to do with the usefulness of having nullable data fields.
Upvotes: 2
Reputation: 27561
This is all completely dependent on the scope and requirements of the project. I wouldn't use number of nullable fields alone as a metric for poorly written or designed code. Have a look at the business domain, if there are many non nullable fields represented there that are nullable in the database, then you have some issues.
Upvotes: 1
Reputation: 18516
Don't know if I consider it always a bad thing, but if the columns are being added because a single record (or maybe a few) need to have values while most don't, then it indicates a pretty flat table structure. If you're seeing column names like "addr1", "addr2", "addr3", then it stinks!
I would bet that most of the columns you have could be removed and represented in other tables. You could find the "non-null" ones through a foreign key relationship. This will increase the joins that you'll be doing, but it could be more preformant that doing a "where not col1 is null".
Upvotes: 8
Reputation: 2387
Default values are typically the exception and NULLs are the norm, in my experience.
True, nulls are annoying.
It's also extremely useful because null is the best indicator of "NO VALUE". A concrete default value is very misleading, and you can lose information or introduce confusion down the road.
Upvotes: 19
Reputation: 10397
I think so. If you don't need the data, then it's not important to your business. If it is important to your business, it should be required.
Upvotes: 1
Reputation: 4797
As a best practice, if a column shouldn't be nullable, then it should be marked as such. However, I don't believe in going completely insane with things like this.
Upvotes: 1