Reputation: 82136

Should I protect my database from invalid data?

I always tend to "protect" my persistance layer from violations via the service layer. However, I am beginning to wonder if it's really necessary. What's the point in taking the time in making my database robust, building relationships & data integrity when it never actually comes into play.

For example, consider a User table with a unique contraint on the Email field. I would naturally want to write blocker code in my service layer to ensure the email being added isn't already in the database before attempting to add anything. In the past I have never really seen anything wrong with it, however, as I have been exposed to more & more best practises/design principles I feel that this approach isn't very DRY.

So, is it correct to always ensure data going to the persistance layer is indeed "valid" or is it more natural to let the invalid data get to the database and handle the error?

Upvotes: 1

Answers (3)

Tim M.

Reputation: 54377

Even though there isn't a conclusive answer, I think it's a great question.

First, I am a big proponent of including at least basic validation in the database and letting the database do what it is good at. At minimum, this means foreign keys, NOT NULL where appropriate, strongly typed fields wherever possible (e.g. don't put a text field where an integer belongs), unique constraints, etc. Letting the database handle concurrency is also paramount (as @Branko Dimitrijevic pointed out) and transaction atomicity should be owned by the database.

If this is moderately redundant, than so be it. Better too much validation than too little.

However, I am of the opinion that the business tier should be aware of the validation it is performing even if the logic lives in the database.

It may be easier to distinguish between exceptions and validation errors. In most languages, a failed data operation will probably manifest as some sort of exception. Most people (me included) are of the opinion that it is bad to use exceptions for regular program flow, and I would argue that email validation failure (for example) is not an "exceptional" case.

Taking it to a more ridiculous level, imagine hitting the database just to determine if a user had filled out all required fields on a form.

In other words, I'd rather call a method IsEmailValid() and receive a boolean than try to have to determine if the database error which was thrown meant that the email was already in use by someone else.

This approach may also perform better, and avoid annoyances like skipped IDs because an INSERT failed (speaking from a SQL Server perspective).

The logic for validating the email might very well live in a reusable stored procedure if it is more complicated than simply a unique constraint.

And ultimately, that simple unique constraint provides final protection in case the business tier makes a mistake.
Some validation simply doesn't need to make a database call to succeed, even though the database could easily handle it.
Some validation is more complicated than can be expressed using database constructs/functions alone.
Business rules across applications may differ even against the same (completely valid) data.
Some validation is so critical or expensive that it should happen prior to data access.
Some simple constraints like field type/length can be automated (anything run through an ORM probably has some level of automation available).

Upvotes: 1

Branko Dimitrijevic

Reputation: 52127

Please don't do that.

Implementing even "simple" constraints such as keys is decidedly non-trivial in a concurrent environment. For example, it is not enough to query the database in one step and allow the insertion in another only if the first step returned empty result - what if a concurrent transaction inserted the same value you are trying to insert (and committed) in between your steps one and two? You have a race condition that could lead to duplicated data. Probably the simplest solution for this is to have a global lock to serialize transactions, but then scalability goes out of the window...

Similar considerations exist for other combinations of INSERT / UPDATE / DELETE operations on keys, as well as other kinds of constraints such as foreign keys and even CHECKs in some cases.

DBMSes have devised very clever ways over the decades to be both correct and performant in situations like these, yet allow you to easily define constraints in declarative manner, minimizing the chance for mistakes. And all the applications accessing the same database will automatically benefit from these centralized constraints.

If you absolutely must choose which layer of code shouldn't validate the data, the database should be your last choice.

So, is it correct to always ensure data going to the persistance layer is indeed "valid" (service layer) or is it more natural to let the invalid data get to the database and handle the error?

Never assume correct data and always validate at the database level, as much as you can.

Whether to also validate in upper layers of code depends on a situation, but in the case of key violations, I'd let the database do the heavy lifting.

Upvotes: 2

Tony Hopkinson

Reputation: 20320

Two reasons to do it. The db may be accessed from another application..

You might make a wee error in your code, and put data in the db, which because your service layer operates on the assumption that this could never happen, makes it fall over if you are lucky, silent data corruption being worst case.

I've always looked at rules in the DB as backstop for that exceptionally rare occasion when I make a mistake in the code. :)

The thing to remember, is if you need to , you can always relax a constraint, tightening them after your users have spent a lot of effort entering data will be far more problematic.

Be real wary of that word never, in IT, it means much sooner than you wished.

Upvotes: 1

Should I protect my database from invalid data?

Answers (3)

Related Questions