Reputation: 77329
I'm designing a database application which stores simple contact information (First/Last Name etc.) and I also have to store phone numbers. Besides phone numbers I have to store what they are for (mobile, business etc.) and possibly an additional comment for each.
My first approach was to normalize and keep the phone numbers in a separate table, so that I have my 'Contacts' and my 'PhoneNumbers' table. The PhoneNumbers table would be like this:
Id int PK
ContactId int FK<->Contacts.Id
PhoneNumber nvarchar(22)
Description nvarchar(100)
However, it would make things a lot easier AND save a SQL Join on retrieval if I just stored this information as part of each contact's record (assuming that I limit the total # of phone numbers that can be stored, to say 4 numbers total).
However, I end up with an "ugly" structure like this:
PhoneNumber1 nvarchar(22)
Description1 nvarchar(100)
PhoneNumber2 nvarchar(22)
Description2 nvarchar(100)
etc. etc.
It looks amateurish to me but here are the advantages I see:
1) In ASP.NET MVC I can simply attach the input textboxes to my LINQ object's properties and I'm done with wiring up record adds and updates.
2) No SQL Join necessary to retrieve the information.
Unfortunately I am not very knowledgeable on issues such as table width problems (I read that this can cause problems if it grows too big/too many columns and that performance issues come up?) and then also it would mean that when I search for a phone number I'd have to look at 4 fields instead of 1 if I kept it in a separate table.
My application has about 80% search/data retrieval activity so search performance is an important factor.
I appreciate your help in finding the right way to do this. Separate table or keep it all in one? Thank you!
Upvotes: 2
Views: 6059
Reputation: 156178
An important principle of denormalization is that it does not sacrifice normalized data. You should always start with a schema that accurately describes your data. As such you should put different kinds of information in different kinds of tables. You should also put as many constraints on your data as you think is reasonable.
All of these goals tend to make queries a teeny bit longer, as you have to join different tables to get the desired information, but with the right names for tables and columns, this shouldn't be a burden from the point of view of readability.
More imporantly, these goals can have an affect on performance. You should monitor your actual load to see if your database is performing adequately. If nearly all of your queries are returning quickly, and you have lots of CPU headroom for more queries, then you're done.
If you find that write queries are taking long, make sure you don't denormalize your data. You will make the database work harder to keep things consistent, since it will have to do many reads followed by many more writes. Instead, you want to look at your indexes. Do you have indexes on columns you rarely query? Do you have indexes that are needed to verify the integrity of an update?
If read queries are your bottleneck, then once again, you want to start by looking at your indexes. Do you need to add an index or two to avoid table scans? If you just can't avoid the table scans, are there any things you could do to make each row smaller, like by reducing the number of characters in a varchar column, or splitting rarely queried columns into another table to be joined upon when they are needed.
If there is a specific slow query that always uses the same join, then that query might benefit from denormalization. First verify that reads on those tables strongly outnumber writes. Determine which columns you need from one table to add to the other. You might want to use a slightly different name to those columns so that it's more obvious that they are from denormalization. Alter your write logic to update both the original table used in the join, and the denormalized fields.
It's important to note that you aren't removing the old table. The problem with denormalized data is that while it accelerates the specific query it was designed for, it tends to complicate other queries. In particular, write queries must do more work to insure that the data remains consistent, either by copying data from table to table, by doing additonal subselects to make sure that the data is valid, or jump over other sorts of hurdles. By keeping the original table, you can leave all your old constraints in place, so at least those original columns are always valid. If you find for some reason that the denormalized columns are out of sync, you can switch back to the original, slower query and everything is valid, and then you can work on ways to rebuild the denormalized data.
Upvotes: 3
Reputation: 2313
I agree with @Tom that normalizing makes more sense and provides flexibility. If you get your indexes right you shouldn't suffer too much by doing a join between the tables.
As for your normalized table I'd add a type or code field so you can ID say Home, Home 1,Home 2, Business, Bus 1, Bus 2, Mobile, Mob1 etc...
Id int PK
ContactId int FK<->Contacts.Id
Code char(5)
PhoneNumber nvarchar(22)
Description nvarchar(100)
And store this type in a separate table with say other code/code description information
We tend to have a Code Group table with info such as
CODE_GROUP, CODE DESC
ST State
PH Phone Number
AD Address Type
And a CODE table with
CODE_ID, CODE_GROUP, DESCRIPTION
MB1 PH Mobile One
MB2 PH Mobile Two
NSW ST New South Wales
etc...
You can expand this out to have long description, short desc, ordering, filtering etc
Upvotes: 1
Reputation: 57815
This might be fine now, but what happens when someone wants a fifth phone number? Do you keep on adding more and more fields?
Another thing to consider is how would you run a query to say 'Give me all people and their mobile phone numbers', or 'Give me everyone with no phone number'? With a separate table, this is easy, but with one table the mobile number could be in any one of four fields so it becomes much more complicated.
If you take the normalised approach, and in future you wanted to add additional data about the phone number, you can simply add another column to the phone numbers table, not add 4 columns to the contacts table.
Going back to the first point about adding more phone numbers in future - if you do add more numbers, you will have to amend probably every query/bit of logic/form that works on data to do with phone numbers.
Upvotes: 6
Reputation: 41442
I'm in favor of the normalized approach. What if you decided that you wanted to add an "Extension" column for business phone numbers? You'd have to create the columns "Extension1", "Extension2", "Extension3", etc. This could become quite tedious to maintain at some point.
Then again, I don't think you can go too wrong either way. It's not like normalization/denormalization will take all that much time if you decided to switch to the other method.
Upvotes: 4
Reputation: 13028
How about an XML field in the Contacts table? It takes away the complexity of another table.
(Please correct me if this is a bad idea, I've never worked with XML fields before)
Upvotes: 0
Reputation: 25659
It won't likely cause problems to have the data denormalized like that, but I wouldn't suggest it. Even though it may be more complex to query, it's better to have well formed data that you can manipulate in a multitude of ways. I would suggest a database schema like this:
Contacts:
ID (Primary Key)
Name
Job Title
Phone Number Categories:
ID (Primary key)
Name
Phone Numbers:
ID (Primary Key)
Category_ID (Foreign Key -> Phone Number Categories.ID)
Contact_ID (Foreign Key -> Contacts.ID)
Phone Number
This allows you a lot of flexibility in the number of phone numbers allowed, and gives you the ability to categorize them.
Upvotes: 9