Please help explain if I'm destroying my DB Schema for the sake of performance :(

I've got a database in production for nearly 3 years, on Sql 2008 (was '05, before that). Has been fine, but it isn't very performant. So i'm tweaking the schema and queries to help speed some things up. Also, a score of main tables contain around 1-3 mill rows, per table (to give u a estimate on sizes).

Here's a sample database diagram (Soz, under NDA so i can't display the original) :-

alt text http://img11.imageshack.us/img11/4608/dbschemaexample.png

Things to note (which are directly related to my problem) :-

A vehicle can have 0 (NULL) or 1 Radio. (Left Outer Join)
A vehicle can have 0 (NULL) or 1 Cupholder (Left Outer Join)
A vehicle has 1 Tyre Type (Inner Join).

Firstly, this looks like a normalised database schema. I suck and DB theory, so I'm guessing this is 3NF (at least) ... famous last words :)

Now, this is killing my database performance because these two outer joins and inner join are getting called a lot AND there's also a few more joins in many statements.

To try and fix this, I thought I might try and indexed view. Creating the view is a piece of cake. But indexing it, doesn't work -> can't create indexed views with joins OR self referencing tables (also another prob :( ).

So, i've cried for hours (and /wrists, dyed hair and wrote an emo song about it and put it on myfailspace) and did the following...

Added a new row into each 'optional' outer join tables (in this example, Radios and CupHolders). ID = 0, rest of the data = 'Unknown Blah' or 0's.
Update Parent tables, so that any NULL data's now have a 0.
Update relationship from outer joins to inner joins.

Now, this works. I can even make my indexed view, which is very fast now.

So ... i'm in pain. This just goes against everything I've been taught. I feel dirty. Alone. Infected.

Is this a bad thing to do? Is this a common scenario ~~of denormalizing a database~~ for the sake of performance?

I would love some thoughts on this, please :)

PS. Those images a random google finds -- so not me.

Upvotes: 1

Answers (4)

repieper

Reputation:

im running into the same issue of performance vs academic excellence. we have a large view on a customer database with 300 columns and 91000 records. we use outer joins to create the view and the performance is pretty bad. we have considered changing to inner joins by putting in the dummy records with a value of zero on the columns we join on (instead of null) to enable a unique index on the view.

i have to agree that if performance is important, sometimes strange things have to be done to make it happen. ultimately those who pay our bills don't care if the architecture is perfect.

Upvotes: 0

paxdiablo

Reputation: 882756

Database should always be designed and initially implemented in 3NF. But the world is a place of reality, not ideals, and it's okay to revert to 2NF (or even 1NF) for performance reasons. Don't beat yourself up about it, pragmatism beats dogmatism in the real world all the time.

Your solution, if it improves performance, is a good one. The idea of having an actual radio (for example), manufactured by nobody and having no features, is not a bad one - it's been done a lot before, believe me :-) The only reason you would use that field as NULL was to see which vehicles have no radio and there's little difference between these queries:

select Registration from vehicles where RadioId is null
select Registration from vehicles where RadioId = 0

My first thought was to simply combine the four tables into one and hang the duplicate data issue. Most problems with DBMS' stem from poor performance rather than low storage space.

Maybe keep that as your fallback position if your current de-normalized schema becomes slow as well.

Upvotes: 1

geofflane

Reputation: 2691

null values generally are not used in indexs. What you've done is to provide a sentinel value so that the column always has a value which allows your indexes to be used more effectively.

You didn't change the structure of your database either, so I wouldn't call this denormalizing. I've done that with date values where you have an "end date" null denoted not ended yet. Instead I made it a known date way in the future which allowed for indexing.

I think this is fine.

Upvotes: 2

duffymo

Reputation: 309028

"...So i'm tweaking the schema and queries to help speed some things up..." - I would beg to differ about this. It seems that you're slowing things down. (Just kidding.)

I like the Database Programmer blog. He has two columns for and against normalization that you might find helpful:

I'm not a DBA, but I think the evidence is in front of your eyes: Performance is worse. I don't see what splitting these 1:1 relationships into separate tables is buying you, but I'll be happy to take instruction.

Before I changed anything, I'd ask SQL Server to EXPLAIN PLAN on every query that was slow and use that information to see exactly what should be changed. Don't guess because a normalization guru told you so. Get the data to back up what you're doing. What you're doing sounds like optimizing middle tier code without profiling. Gut feelings aren't very accurate.

Upvotes: 0

Please help explain if I&#39;m destroying my DB Schema for the sake of performance :(

Answers (4)

Related Questions

Please help explain if I'm destroying my DB Schema for the sake of performance :(