Reputation: 728
How far does one go to eliminate duplicate data in a database? Because you could go OTT and it would get crazy. Let me give you an example...
If I were to create a Zoo database which contains a table 'Animal' which has a 'name', 'species' and 'country_of_birth'
But there will be duplicate data there as many animals could come from same country and there could be lots of tigers, for example.
So really there should be a 'Species' table and a 'Country_of_birth' table But then after a while you would have tons of tables So how far do you go?
In this question I am just using one table as an example. One row in the Animal table stores information about a single animal in the zoo. So that animal's name, species and country of birth, as well as a unique animalID.
Upvotes: 1
Views: 927
Reputation: 95761
But there will be duplicate data there as many animals could come from same country and there could be lots of tigers, for example.
This suggests you want to keep track of individual animals, not just kinds of animals. Let's assume that the zoos use some kind of numeric tattoo or microchip to identify individual animals.
Assume this sample data is representative. (It's not, but it's ok for teaching.)
Animals
Predicate: Animal having microchip <chip_num> of species <species>
has name <name> and was born in <birth_country_code>.
chip_num name species birth_country_code
--
101234 Anita Panthera tigris USA
101235 Bella Panthera tigris USA
101236 Calla Panthera tigris USA
101237 Dingo Canis lupus CAN
101238 Exeter Canis lupus CAN
101239 Bella Canis lupus USA
101240 Bella Canis lupus CAN
There's no redundant data in that table. None of those columns can be dropped without radically changing the meaning of that table. It has a single candidate key: chip_num. It's in 5NF.
Values are repeated in non-key columns. That's kind of the definition of non-key (non-prime) columns. Values in key columns (or sets of key columns) are unique; values in non-key columns aren't.
If you want to restrict the values in "birth_country_code" to the valid three-letter ISO country codes, you can add a table of valid three-letter ISO country codes, and set a foreign key reference to it. This is generally a Good Thing, but it has nothing to do with normalization.
iso_country_code
--
CAN
USA
You could do the same thing again for "species". That, too, would generally be a Good Thing, and it, too, would have nothing to do with normalization.
Upvotes: 1
Reputation: 14418
It comes down to deciding what is important to your system.
Deciding whether something (your e.g. "country of birth") is merely an attribute or is instead a full-blown entity in its own right depends on what else your system needs to know about countries and how many attributes your system may track that are fully functionally dependent on the country.
You should also consider whether your attributes are susceptible to update anomalies. If your statement of country in the animal table is in the form of the full official name of the country, then you might be at risk if, for example, "The Belgian Congo" suddenly becomes "The Democratic Republic of the Congo" - oh wait, that already happened!
The rules of normalization are not sacrosanct. They are pretty darn useful rules of thumb that are intended to keep you out of trouble, most of the time. Still, rules are made to be broken - but you should only break them knowingly and with a carefully considered understanding of the consequences.
Upvotes: 0
Reputation: 146597
First you decide What the table is supposed to carry information about. In your example. is the table about individual animals? or is it about species of animals and how many of each species? The fact that you have country of birth might be an indicator that someone wants it to be the former. If that is the case you must have a key that identifies individual animals. You have an attribute, (a property) that is associated with individuals, so each row must (should?) represent an individual. You should read up Here on the database modeling concepts of Identity, and Individuation.
And to do this properly, actually, you do this for each thing in your data model, and then convert that model into database tables.
Upvotes: 0