Reputation: 3155
I am designing a database for a web site, where performance is the top most priority.
the key functionality is based around two tables.
and these table have many-to-many relations ship.
to split this, i have added an extra table Table1_Table2 that then holds teh combination of both tabl'e primary key on per relation basis.
e.g. I have all my cars in Car table, and all my colors in color table
CarTable
ID(PK) - Name
1 - BMW
2 - Mercedes
3 - VW
4 - AUdi
ColorTable
ID(PK) - Color
1 - Blue
2 - Green
3 - Black
4 - Yellow
for many-to-many relation i have done this:
Car_ColorTable
ID(PK) - CarID - ColorID
1 - 1 - 2
2 - 1 - 4
3 - 2 - 4
4 - 3 - 1
5 - 4 - 1
6 - 4 - 3
7 - 4 - 3
is this a good design considering:
1) performance is the top priority.
2) the tables will have huge amount of data (more than 1 million records in both tables, and you can imagine how many rows will end up in Car_ColorTable.
if the above design is not the solution, how should i design this?
Upvotes: 0
Views: 746
Reputation: 52157
Regarding the Car_ColorTable
table, unless you actually want to allow multiple connections between the same car and color, or have some other specific reason for it, drop the surrogate key {Id}
and just use the natural key which is the combination of car and color.
How exactly to do it depends on the queries you need to run:
{CarID, ColorID}
.{ColorID, CarID}
.{CarID, ColorID}
and a secondary index on {ColorID, CarID}
.
Upvotes: 0
Reputation: 12804
Your design seems fine. Things to remember:
A million records really isn't too much and you can have fast queries. Get a decent server with enough processing power and memory and you should be fine.
Upvotes: 1
Reputation: 95751
You probably meant to say that SELECT performance is the most important. But you can't let SELECT performance trump data integrity. Getting the wrong answer really fast is never a good requirement.
If you use surrogate keys (integers), your primary key should be primary key (car_id, color_id)
. The additional surrogate key "ID" is useless here, and will usually hurt SELECT performance. (More columns, wider rows, fewer rows per data page on disk, more disk I/O.)
You should test with both natural keys (car name and car color) and with surrogate keys. The surrogate keys will require two joins for every query; natural keys need no joins. Tables using surrogate keys (cars, colors) still need a unique constraint on the names. You don't want to discover later that "blue" has 13 different ID numbers.
Spend an hour or two writing a script to generate a couple of million rows with and without surrogate keys, and compare the performance.
Upvotes: 1
Reputation: 2629
This is the best way to map the relationship. Just make sure you know from what object you will usually try to aproach the relation and place your clustered index on that Column.
You can also choose to create a combined PK but then you cannot use any duplicates.
Upvotes: 2
Reputation: 2255
it should be Car_ColorTable
CarID (PK) - ColorID (PK)
you dont want the id column there,
primary key should have both columns
you can create similar non clustered index with columns in oposite order that mean (ColorID, CarID)
and thats it
Upvotes: 3