Reputation: 1641
I am building a database of possible vehicles, where each entry has a Make, Model, Year, and Engine.
I split that into a tables of Makes (Ford, Chevrolet, BMW, etc) and Models (Impala, Camaro, F-150, etc) and Years (1920, ... 2012) and Engines (327, 350, etc).
Since I now have a table for each of Make, Model, Year, and Engine, and they can each have unique primary key, every row in the main "MakesModelsAndYears" table is just made up of four foreign keys.
Is this overkill, or really stored more efficiently than just one big table where I created unique indexes? My concern with the "one big table" approach is that the years, like 1970, would be repeated many times (1970 Chevrolet Impala, 1969 Chevrolet Camaro, etc) as would have model and even engine.
Thanks for any guidance!
Followup:
For those following along, I incorporated the feedback in the answers and arrived that this schema. The image doesn't show the FKs in detail, but they're effectively what the answer suggested:
Upvotes: 4
Views: 1000
Reputation: 95761
Chevrolet doesn't make a Mustang. Ford didn't make a Mustang in 1960. Your structure will allow a lot of nonsense.
The issue isn't that every column is a foreign key; there's nothing necessarily wrong with that. The issue is that the foreign keys are wrong.
I split that into a tables of Makes (Ford, Chevrolet, BMW, etc) and Models (Impala, Camaro, F-150, etc) and Years (1920, ... 2012) and Engines (327, 350, etc).
And that's why they're wrong. When you normalize a relation, you start with the relation, identify the candidate keys, and work out the functional dependencies. Just making single-column "lookup" tables for every column isn't normalization, and it doesn't constrain your data in the way that's needed. (And in this particular case, the constraints are the missing piece, not normalization to 5NF.)
Make Model Yr Engine
--
Ford F-150 2012 3.7L V6
Ford F-150 2012 3.5L V6 EcoBoost
Ford F-150 2012 5.0L V8
Ford F-150 2012 6.2L V8
Ford F-150 2011 3.7L V6
Ford F-150 2011 3.5L V6 EcoBoost
Ford F-150 2011 5.0L V8
Ford F-150 2011 6.2L V8
Chevrolet Camaro 2012 3.6L V6
Chevrolet Camaro 2011 3.6L V6
Chevrolet Camaro 2011 6.2L V8
Chevrolet Camaro 1980 229ci V6
Chevrolet Camaro 1980 267ci V8
Chevrolet Camaro 1980 305ci V8
Cadillac CTS 2004 3.6L V6
Vauxhall Astra 1979 1.3L
Vauxhall Astra 1979 1.6L
Vauxhall Astra 1979 1.8L
Opel Astra 1979 1.5L
Opel Astra 1979 2.0L
It should be clear that the only candidate key is {Make, Model, Yr, Engine}. So this table is all key, and it has no non-prime attributes.
To add "lookup" tables as constraints on data, it's not good enough to say that in the first column you have to choose from {Ford, Chevrolet, Cadillac, Vauxhall, Opel}, and in the second column you have to choose from {F-150, Camaro, CTS, Astra}. The right "lookup" table for make and model includes both make and model; you choose from {Ford F-150, Chevrolet Camaro, Cadillac CTS, Vauxhall Astra, Opel Astra}. (In this case, it goes even further. See the table model_years below.)
create table makes (
make varchar(25) primary key
);
insert into makes values
('Ford'),
('Chevrolet'),
('Cadillac'),
('Vauxhall'),
('Opel');
create table models (
make varchar(25) not null references makes (make),
model varchar(25) not null,
primary key (make, model)
);
insert into models values
('Ford', 'F-150'),
('Chevrolet', 'Camaro'),
('Cadillac', 'CTS'),
('Vauxhall', 'Astra'),
('Opel', 'Astra');
create table model_years (
make varchar(25) not null,
model varchar(25) not null,
year integer not null check (year between 1900 and 2050),
primary key (make, model, year),
foreign key (make, model) references models (make, model)
);
insert into model_years values
('Ford', 'F-150', 2012),
('Ford', 'F-150', 2011),
('Chevrolet', 'Camaro', 2012),
('Chevrolet', 'Camaro', 2011),
('Chevrolet', 'Camaro', 1980),
('Cadillac', 'CTS', 2004),
('Vauxhall', 'Astra', 1979),
('Opel', 'Astra', 1979);
create table model_year_engines (
make varchar(25) not null,
model varchar(25) not null,
year integer not null,
engine varchar(25) not null,
primary key (make, model, year, engine),
foreign key (make, model, year) references model_years (make, model, year)
);
insert into model_year_engines values
('Ford', 'F-150', 2012, '3.7L V6'),
('Ford', 'F-150', 2012, '3.5L V6 EcoBoost'),
('Ford', 'F-150', 2012, '5.0L V8'),
('Ford', 'F-150', 2012, '6.2L V8'),
('Ford', 'F-150', 2011, '3.7L V6'),
('Ford', 'F-150', 2011, '3.5L V6 EcoBoost'),
('Ford', 'F-150', 2011, '5.0L V8'),
('Ford', 'F-150', 2011, '6.2L V8'),
('Chevrolet', 'Camaro', 2012, '3.6L V6'),
('Chevrolet', 'Camaro', 2011, '3.6L V6'),
('Chevrolet', 'Camaro', 2011, '6.2L V8'),
('Chevrolet', 'Camaro', 1980, '229ci V6'),
('Chevrolet', 'Camaro', 1980, '267ci V8'),
('Chevrolet', 'Camaro', 1980, '305ci V8'),
('Cadillac', 'CTS', 2004, '3.6L V6'),
('Vauxhall', 'Astra', 1979, '1.3L'),
('Vauxhall', 'Astra', 1979, '1.6L'),
('Vauxhall', 'Astra', 1979, '1.8L'),
('Opel', 'Astra', 1979, '1.5L'),
('Opel', 'Astra', 1979, '2.0L');
No engine can go in this table unless its row first exists in model_years. No year can go in model_years unless its row first exists in models. And no row can go in models unless its row first exists in makes.
You can make a good case for using ON UPDATE CASCADE
in a schema like this. You can also make a good case for not using it. Oracle doesn't support ON UPDATE CASCADE
, which is one reason you see ID numbers peppering Oracle tables, and why you sometimes see people say "Primary key values must never change."
These are the kinds of tables you need to implement your known requirements.
Upvotes: 6
Reputation: 5101
At first blush I see all those tables as pointless. Perhaps MakesModelsAndYears
is all you need. And then I'd reconsider its name. At the very least I'd drop the "And". At best I'd rename it "Cars"
Table Keys and Data Relationships
Not necessarily the same thing. A primary key uniquely identifies the rows of a given table. That's all. A foreign key is a "guarantee" that a given value exists on some other table. Data can be related by other than formally defined keys. We sometimes call these candidate keys. Oh, there is no law that says you must have a primary key in any given table.
I've worked with databases where I often joined tables on other than primary and/or foreign keys. That just how the real data relationships worked out.
Data Normalization
As a general rule you want to avoid data redundancy across tables. However if your Years
table row has only one column - "year" then what's the point of that (ditto for those other tables as well)? You essentially duplicate that data in MakeModelsAndYears
pointing to it.
And if you do keep Model
, Engine
, Year
, Makes
tables don't make the stupid, boneheaded mistake of creating "ID" column in your Makes
table (ditto for all the tables) just so you don't have to store "Chevrolet" in the MakeModelAndYears
table. Imagine looking at that table and all you see are numbers in row after row!! To display meaningful information you'll have to make lots of Joins - just to say "1960 Chevy 454 Hemi Impala". Now that is inefficient!
Indexes
What indexes you make is driven by how you'll be looking up data and the database's performance when doing so.
Make indexes in particular for those columns that you may be joining or searching on but are not formal keys.
Also multi-column indexes are great for those columns that you frequently search for together. And the index column order matters. I've seen a very significant difference when all I did was reverse the column order in a certain index. Your mileage may vary (pun intended). And yes, differently ordered indexes on the same columns may make sense depending on table sizes, join column order, lookup frequency. You'll just have to look at these things if/when you have performance issues.
Upvotes: 0
Reputation: 16755
The PK/FK relationship is a really powerful one that opens up opportunities for you in your database. With that being said, it isn't always appropriate. A lot of that depends on the database and how you will use it. A single table for the above data will help with faster data access and easier reporting but it sacrifices scalability and single-instance data storage.
For your particular case, I would suggest putting the years back in the main table. Since the year is an int, duplicating isn't a big deal. I would also say that you should probably link your models table to your makes table since the makes are connected to the models.
Upvotes: -1
Reputation: 115640
There is no issue at all having a table with 2, 3, 4 or more Foreign Keys, and the Primary Key being the combination of those FKs, if that fits with your model.
The only problem I see with this design is that it allows a "BMW Escort" or a "Ford Z4". Perhaps you can change the design to:
Makes
-----
Make PK
Models
------
Make PK, FK to Makes
Model PK
MakesModelsAndYears
-------------------
Year PK, FK1 to Years
Make PK, FK2 to Model
Model PK, FK2
EngineSize PK, FK3 to Engines
Upvotes: 6