Ryan
Ryan

Reputation: 5682

More efficient to have two tables or one table with tons of fields

Related but not quite the same thing:which is more effcient? (or at least reading through it didn't help me any)

So I am working on a new site (selling insurance policies) we already have several sites up (its a rails application) that do this so I have a table in my sql database called policies. As you can imagine it has lots of columns to support all the different options available.

While working on this new site I realized I needed to keep track of 20+ more options.

My concern is that the policies table is already large, but the columns in it right now are almost all used by every application we have. Whereas if I add these they would only be used for the new site and would leave tons of null cells on all the rest of the policies.

So my question is do I add those to the existing table or create a new table just for the policies sold on that site? Also I believe that if I created a new table I could leave out some of the columns (but not very many) from the main policies table because they are not needed for this application.

Upvotes: 1

Views: 247

Answers (1)

Jon Hanna
Jon Hanna

Reputation: 113262

"[A]lmost all used" suggests that you could, upon considering it, split it more naturally.

Now, much of the efficiency concern here goes down to three things:

  1. A single table can be scanned through more quickly than joins across several.
  2. Large rows have a memory and disk-space cost in themselves.
  3. If a single table represents something that is really a 1-to-many, then it requires more work on insert, delete or update.

Point 2 only really comes in, should there be a lot of cases where you need one particular subset of the data, and another batch where you need another subset, and maybe just a few where you need them all. If you're using most of the columns in most places, then it doesn't gain you anything. In that case, splitting tables is bad.

Point 1 and 3 argue for and against joining into one big table, respectively.

Before any of that though, let's get back to "almost all". If there are several rows with a batch of null fields, why? Often answering that "why?" reveals that really there's a natural split there, that should be broken off into another table as part of normal normalisation*. Repetition of fields, is an even greater suggestion that this is the case.

Do this first.

To denormalise - whether by splitting what is naturally one table, or joining what is naturally several - is a very particular type of optimisation - it makes some things more efficient at the cost of making other things less efficient, and it introduces possibilities of bugs that don't exist otherwise. I would never say you should never denormalise - I do it myself - but you need to be able to say "I am denormalising table X & Y in this manner, because it will help case C which happens enough and I can live with the extra cost to case D". Then you need to check it actually did help case C significantly and case D insignificantly, along with looking for hidden costs.

One of the reasons for normalising in the first place is it gives good average performance over a wide range of cases. It's the balance you want most of the time. Denormalising from the get-go rather than with a normalised database as a starting point is almost always premature.

*Fun trivia fact: The name "normalization" was in part a take on Richard Nixon's "Vietnamisation" policy meaning there was a running joke in some quarters of adding "-isation" onto just about anything. Were it not for the Whitehouse's reaction to the Tet Offensive, we could be using the gernund "normalising," or something completely different instead.

Upvotes: 3

Related Questions