Ferenc Deak
Ferenc Deak

Reputation: 35408

Best way to store person data (first name, last name) in database

I am just wondering which is more recommended when it comes about storing person names in a database.

The first is, that I have the table, with two dedicated columns, such as: FamilyName, GivenName, both of them are string type and we store the family name (or family names if there are more than one) and same with given names. This will end up with a structure like

+-------------------------------+
|              PERSON           |
+-------------+------------+----+
| FamilyName  | Given Name | ID |
+-------------+------------+----+
| Doe         | John       |  1 |
| Doe         | Peter      |  2 |
| Williamson  | Williams   |  3 |
| Hansen      | Williams   |  4 |
+-------------+------------+----+

Or another approach is that I have a separated table where I store only names, and in the name table I have foreign key references to that table, and I have a more complex relationship person table, as represented below:

+-----------------+
| NAME_STRINGS    |
+---+-------------+
|ID | VALUE       |
+---+-------------+
| 1 | Doe         |
| 2 | John        |
| 3 | Hansen      |
| 4 | Peter       |
| 5 | Williamson  |
| 6 | Williams    |
+---+-------------+

+-------------------------------+
|              PERSON           |
+-------------+------------+----+
|FamilyNameId | GivenNameId| ID |
+-------------+------------+----+
| 1           | 2          |  1 |
| 1           | 4          |  2 |
| 5           | 6          |  3 |
| 3           | 6          |  4 |
+-------------+------------+----+

Please note, that I am not considering complex cases, such as double, triple or middle names, I am more interested in the concept of eliminating data duplication. Is it worth overcomplicating a solution like this for a database that will have around ~10 000 000 unique persons, but sharing a lot of common names?

Upvotes: 2

Views: 4867

Answers (2)

Simo Kivistö
Simo Kivistö

Reputation: 4453

The answer depends on your situation but if you are not going to store any more data in your table about a specific name (what is the latin version, what's the history of the name etc.) then you would definitely go with one table with firstname and lastname columns.

From a normalization standpoint it would make sense if either firstname or lastname would be dependent on any other factor than the id of the person. As it is, you can't tell a persons name by looking at his/her address, position or any other data stored in the database. Neither can you tell his lastname by looking at his firstname or vice versa. You can only say his name is functionally dependent of his uniquely individual id.

If you had name_info field with info like "John is derived from latin Johannes", then it would not make sense to repeat the info for every John in the database and thus it should be moved to another table.

Upvotes: 2

Maciej Los
Maciej Los

Reputation: 8591

The deep of data normalization depends on many factors. In this case i'd suggest to change nothing. Let's say there are hundreds of John Doe's. I think you need to get another attribute which uniqually identify each of them, for example: Personal ID.

For further information, please see:
Description of the database normalization basics
Introduction to Data Normalization: A Database "Best" Practice
Normalization of Database

Upvotes: 0

Related Questions