Catalin
Catalin

Reputation: 11731

SQL Server: Many columns in a table vs Fewer columns in two tables

I have a database table (called Fields) which has about 35 columns. 11 of them always contains the same constant values for about every 300.000 rows - and act as metadata.

The down side of this structure is that, when i need to update those 11 columns values, i need to go and update all 300.000 rows.

I could move all the common data in a different table, and update it only one time, in one place, instead of 300.000 places.

However, if i do it like this, when i display the fields, i need to create INNER JOIN's between the two tables, which i know makes the SELECT statement slower.

I must say that updating the columns occurs more rarely than reading (displaying) the data.

How you suggest that i should store the data in database to obtain the best performances?

Upvotes: 1

Views: 2892

Answers (4)

Ravi Sankar Rao
Ravi Sankar Rao

Reputation: 1070

The best way is to seperate the data and form second table with those 11 columns and call it as some MASTER DATA TABLE, which will be having a primary key.

This primary key can be referred as a foreign key in those 30,000 rows in the first table

Upvotes: 0

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52157

This is a classic example of denormalized design. Sometimes, denormalization is done for (SELECT) performance, and always in a deliberate, measurable way. Have you actually measured whether you gain any performance by it?

If your data fits into cache, and/or the JOIN is unusually expensive1, then there may well be some performance benefit from avoiding the JOIN. However, the denormalized data is larger and will push at the limits of your cache sooner, increasing the I/O and likely reversing any gains you may have reaped from avoiding the JOIN - you might actually lose performance.

And of course, getting the incorrect data is useless, no matter how quickly you can do it. The denormalization makes your database less resilient to data inconsistencies2, and the performance difference would have to be pretty dramatic to justify this risk.


1 Which doesn't look to be the case here.

2 E.g. have you considered what happens in a concurrent environment where one application might modify existing rows and the other application inserts a new row but with old values (since the first application hasn't committed yet so there is no way for the second application to know that there was a change)?

Upvotes: 0

TomTom
TomTom

Reputation: 62159

I could move all the common data in a different table, and update it only one time, in one place, instead of 300.000 places.

I.e. sane database design and standad normalization.

This is not about "many empty fields", it is brutally about tons of redundant data. Constants you should have isolated. Separate table. This may also make things faster - it allows the database to use memory more efficient because your database is a lot smaller.

Upvotes: 6

Anton Kovalenko
Anton Kovalenko

Reputation: 21517

I would suggest to go with a separate table unless you've concealed something significant (of course it would be better to try and measure, but I suspect you already know it).

You can actually get faster selects as well: joining a small table would be cheaper then fetching the same data 300000 times.

Upvotes: 2

Related Questions