Mazatec
Mazatec

Reputation: 11649

What's better - many small tables or one big table?

I've got a database which will store profiles about individuals. These individuals have about 50 possible fields.

Some are common things like, first name, last name, email, phone number.

Others are things like hobbies, skills, interests

Some are height, weight, skin color.

Each of these groups are used by the system at different times. In terms of being able to negotiate through the database I would prefer to have 7 tables each of about 8 fields. What is best practice to do?

EDIT: The data is going to be used in a search engine, for finding profile matches. Does this affect what I am doing?

Upvotes: 52

Views: 48442

Answers (9)

NealB
NealB

Reputation: 16928

I'm with the Normalize camp.

Here are a few hints to get you started:

Start with a process to assign some arbitrary unique identifier to each "person". Call this the PersonId or something like that. This identifier is called a surrogate key. The sole purpose of a surrogate key is to guarantees a 1 to 1 relationship between it and a real person in the real world. Use the surrogate key when associating the value of some other attribute to a "person" in your database.

As you develop your database layout you may find surrogate keys necessary (or at least useful) for some other attributes as well.

Look at each attribute you want to manage. Ask the following question: Does any given person have only one value for this attribute?

For example, each person has exactly one "Birth Date". But how may "Hobbies" can they have? Probably zero to many. Single valued attributes (eg. Birth date, height, weight etc.) are candidates to go into a common table with PersonId as the key. The number of attributes in each table should not be of concern at this point.

Multi valued attributes such as Hobby need a slightly different treatment. You might want to create separate tables for each multi-valued attribute. Using Hobbies as an example you might create the following table PersonHobby(PersonId, Hobby). A row in this table might look something like: (123, "Stamp Collecting"). This way you can record as many hobbies as required for each person, one per row. Do the same for "Interest", "Skill" etc.

If there are quite a number of multi-valued attributes where the combination of PersonId + Hobby determine nothing else (ie. you don't have anything interesting to record about this person doing this "Hobby" or "Interest" or "Skill") you could lump them into an Attribute-Value table having a structure something like PersonAV(PersonId, AttributeName, Value). Here a row might look like: (123, "Hobby", "Stamp Collecting").

If you go this route, it is also a good idea to substitute the AttributeName in the PersonAV table for a surrogate key and create another table to relate this key to its description. Something like: Attribute(AttributeId, AttributeName). A row in this table would look something like (1, "Hobby") and a corresponding PersonAV row could be (123, 1, "Stamp Collecting"). This is commonly done so that if you ever need to know which AttributeNames are valid in your database/application you have a place to look them up. Think about how you might validate whether "Interest" is a valid value for AttributeName or not - if you haven’t recorded some person having that AttributeName then there is no record of that AttributeName on your database - how do you know if it should exist or not? Well look it up in the Attribute table!

Some attributes may have multiple relationships and that too will influence how tables are normalized. I didn't see any of these dependencies in your example so consider the following: Suppose we have a warehouse full of parts, the PartId determines its WeightClass, StockCount and ShipCost. This suggests a table something like: Part(PartId, WeightClass, StockCount, ShipCost). However if relationship exists between non-key attributes then they should be factored out. For example suppose WeightClass directly determines ShipCost. This implies that WeightClass alone is enough to determine ShipCost and ShipCost should be factored out of the Part table.

Normalization is a fairly subtle art. You need to identify the functional dependencies that exist between all of the attributes in your data model in order to do it properly. Just coming up with the functional dependencies takes a fair bit of thought and consideration - but it is critical to getting to the proper database design.

I encourage you to take the time to study normalization a bit more before building your database. A few days spent here will more than pay for itself down the road. Try doing some Google/Wikipedia searches for "Functional Dependency", "Normalization" and "Database Design". Read, study, learn, then build it right.

The suggestions I have made with respect to normalizing your database design are only a hint as to the direction you might need to take. Without having a strong grasp of all the data you are trying to manage in your application, any advice given here should be taken with a "grain of salt".

Upvotes: 46

RKh
RKh

Reputation: 14161

I would recommend few tables. Over normalization is difficult to manage and you would end up writing complex queries which ends up with slow performance.

Normalize only when absolutely needed and think in logical terms. With the limited information you provided above, I would go for three tables:

Table 1: PersonalDetails Table 2: Activities Table 3: Miscellaneous

There are other techniques to speed up the performance like clustering etc., which you can use depending upon your need.

Upvotes: 10

David Oneill
David Oneill

Reputation: 13065

Unless every person has the same number of hobbies (IE everyone has 2 hobbies listed), it should be normalized.

Fields that are always 1 to 1 with the person should be in the same table. Age for example. No person will have two different ages.

Upvotes: 5

Raj More
Raj More

Reputation: 48016

IMO, it is more important to worry about the quality of data stored than the number of tables that you need.

For example, do you need to track changes? If John was 5'2" in January 2007 and is 5'11" in Oct 2010, do you want to know? If so, you will need to separate out the person from the height into two tables.

How about hobbies - are they allowed to only have 3 hobbies? Can they have more / less? Is this something you would want to query in the future? If so, you need a separate table.

You should read up on database design and normalization (there are several excellent threads on this site itself).

https://stackoverflow.com/questions/tagged/normalization

Upvotes: 7

Novikov
Novikov

Reputation: 4489

There is not database organization that's 100% correct, there's only one that's good enough for your purposes. If you don't foresee surpassing the capabilities of a single good database server in the future, then normalize the data and use plenty of constraints such as foreign keys, cascading deletes and such as that will make your database a joy to work with. On the other hand if you look at the databases of a lot of applications that have billions of requests you'll find that they forgo a lot of these niceties in the name of performance and scalability.

Upvotes: 4

Kevin O'Donovan
Kevin O'Donovan

Reputation: 1669

From what you've described I'd certainly break that into multiple tables. I wouldn't split on an arbitrary number of columns though, instead try to think of logical collections of columns that either make up an entity or match the access patterns you're going to be using to hit the data

Upvotes: 6

Adeel
Adeel

Reputation: 19228

many small tables i.e. Normalization is best here. it provides flexiblility, reduces redundancy and a better database organization.

Upvotes: 3

Matthew J Morrison
Matthew J Morrison

Reputation: 4403

There is no correct answer to this question because it largely depends on when and how you are going to be using your data, how frequently it will change, and what the volume of usage will be on the database.

What I would personally do would be to organize your data into logical entities and create tables based on those entities. This is at least where I would start.

Upvotes: 3

Jim
Jim

Reputation: 18853

It is hard to say, and is based on what the application requires. I would say to look into Database Normalization as it will show you how to normalize the database and in that it should shed light on what you would want to separate out into their own tables etc.

Upvotes: 39

Related Questions