Reputation: 3155

SQL Table Design for Performance

I am designing a database for a web site, where performance is the top most priority.

the key functionality is based around two tables.

and these table have many-to-many relations ship.

to split this, i have added an extra table Table1_Table2 that then holds teh combination of both tabl'e primary key on per relation basis.

e.g. I have all my cars in Car table, and all my colors in color table

CarTable

ID(PK) - Name

1         - BMW
2         - Mercedes
3         - VW
4         - AUdi

ColorTable

ID(PK) - Color

1         - Blue
2         - Green
3         - Black
4         - Yellow

for many-to-many relation i have done this:

Car_ColorTable

ID(PK) - CarID - ColorID

1          - 1       - 2
2          - 1       - 4
3          - 2       - 4
4          - 3       - 1
5          - 4       - 1
6          - 4       - 3
7          - 4       - 3

is this a good design considering:

1) performance is the top priority.

2) the tables will have huge amount of data (more than 1 million records in both tables, and you can imagine how many rows will end up in Car_ColorTable.

if the above design is not the solution, how should i design this?

Upvotes: 0

Answers (5)

Branko Dimitrijevic

Reputation: 52157

Regarding the Car_ColorTable table, unless you actually want to allow multiple connections between the same car and color, or have some other specific reason for it, drop the surrogate key {Id} and just use the natural key which is the combination of car and color.

How exactly to do it depends on the queries you need to run:

If you need: "for given car, give me colors", create a composite clustered primary key on {CarID, ColorID}.
If you need "for given color, give me cars", create a composite clustered primary key on {ColorID, CarID}.
If you need both, create a primary key on {CarID, ColorID} and a secondary index on {ColorID, CarID}.
- Unless there are additional fields you have not shown us, make the PK clustered.
- If you do have additional fields, either use non-clustered (i.e. heap-based) table, or cover all the fields with these two indexes (the INCLUDE keyword might come in handy for that).

Upvotes: 0

UnhandledExcepSean

Reputation: 12804

Your design seems fine. Things to remember:

Indexes are your friend. Use them.
When talking performance, keep in mind that improving read performance normally incurs a small update/insert write performance hit.

A million records really isn't too much and you can have fast queries. Get a decent server with enough processing power and memory and you should be fine.

Upvotes: 1

Mike Sherrill 'Cat Recall'

Reputation: 95751

You probably meant to say that SELECT performance is the most important. But you can't let SELECT performance trump data integrity. Getting the wrong answer really fast is never a good requirement.

If you use surrogate keys (integers), your primary key should be primary key (car_id, color_id). The additional surrogate key "ID" is useless here, and will usually hurt SELECT performance. (More columns, wider rows, fewer rows per data page on disk, more disk I/O.)

You should test with both natural keys (car name and car color) and with surrogate keys. The surrogate keys will require two joins for every query; natural keys need no joins. Tables using surrogate keys (cars, colors) still need a unique constraint on the names. You don't want to discover later that "blue" has 13 different ID numbers.

Spend an hour or two writing a script to generate a couple of million rows with and without surrogate keys, and compare the performance.

Upvotes: 1

JMan

Reputation: 2629

This is the best way to map the relationship. Just make sure you know from what object you will usually try to aproach the relation and place your clustered index on that Column.

You can also choose to create a combined PK but then you cannot use any duplicates.

Upvotes: 2

WKordos

Reputation: 2255

it should be Car_ColorTable

CarID (PK) - ColorID (PK)

you dont want the id column there,
primary key should have both columns
you can create similar non clustered index with columns in oposite order that mean (ColorID, CarID)
and thats it

Upvotes: 3

SQL Table Design for Performance

Answers (5)

Related Questions