Reputation: 35408
I am just wondering which is more recommended when it comes about storing person names in a database.
The first is, that I have the table, with two dedicated columns, such as: FamilyName, GivenName, both of them are string type and we store the family name (or family names if there are more than one) and same with given names. This will end up with a structure like
+-------------------------------+
| PERSON |
+-------------+------------+----+
| FamilyName | Given Name | ID |
+-------------+------------+----+
| Doe | John | 1 |
| Doe | Peter | 2 |
| Williamson | Williams | 3 |
| Hansen | Williams | 4 |
+-------------+------------+----+
Or another approach is that I have a separated table where I store only names, and in the name table I have foreign key references to that table, and I have a more complex relationship person table, as represented below:
+-----------------+
| NAME_STRINGS |
+---+-------------+
|ID | VALUE |
+---+-------------+
| 1 | Doe |
| 2 | John |
| 3 | Hansen |
| 4 | Peter |
| 5 | Williamson |
| 6 | Williams |
+---+-------------+
+-------------------------------+
| PERSON |
+-------------+------------+----+
|FamilyNameId | GivenNameId| ID |
+-------------+------------+----+
| 1 | 2 | 1 |
| 1 | 4 | 2 |
| 5 | 6 | 3 |
| 3 | 6 | 4 |
+-------------+------------+----+
Please note, that I am not considering complex cases, such as double, triple or middle names, I am more interested in the concept of eliminating data duplication. Is it worth overcomplicating a solution like this for a database that will have around ~10 000 000 unique persons, but sharing a lot of common names?
Upvotes: 2
Views: 4867
Reputation: 4453
The answer depends on your situation but if you are not going to store any more data in your table about a specific name (what is the latin version, what's the history of the name etc.) then you would definitely go with one table with firstname
and lastname
columns.
From a normalization standpoint it would make sense if either firstname
or lastname
would be dependent on any other factor than the id
of the person. As it is, you can't tell a persons name by looking at his/her address
, position
or any other data stored in the database. Neither can you tell his lastname
by looking at his firstname
or vice versa. You can only say his name is functionally dependent of his uniquely individual id
.
If you had name_info
field with info like "John is derived from latin Johannes", then it would not make sense to repeat the info for every John in the database and thus it should be moved to another table.
Upvotes: 2
Reputation: 8591
The deep of data normalization depends on many factors. In this case i'd suggest to change nothing. Let's say there are hundreds of John Doe's. I think you need to get another attribute which uniqually identify each of them, for example: Personal ID.
For further information, please see:
Description of the database normalization basics
Introduction to Data Normalization: A Database "Best" Practice
Normalization of Database
Upvotes: 0