Reputation: 10604
When designing a new application/database for multiple users, it usually involves having a table for user management.
The table usually always has Username as well as ID....
Just to check that I am not being an idiot (well, you may still think that!) I have just downloaded and taken a look at the schema of both MediaWiki and PHPBB, and they do the same.
When I first learnt about relational databases, I was always told that a big rule was never to duplicate data or do anything Unnecessary.
So, why is it that we have ID as the primary key instead of Username?
I understand the reason if it was not unique (such as the SO series of sites), however in these applications, it is.
The only thing I could think of is it is quicker to do a Select * from xxx where ID="454"
instead of Select * from xxx where name="some_really_long_name"
or because having really long names would increase the database size a lot.
Are these the only reasons, or have I missed something here?
Upvotes: 3
Views: 388
Reputation: 146449
Keys serve two disparate purposes, one is to prevent insertion of duplicate rows.... that doesn't mean that the data values are all the same, it means that the two rows represent the same real-world entity. Only a meaningful natural key can accomplish this.
The second purpose is to act as targets of foreign key columns in dependent tables.
For this, the narrowest, (smallest number of bytes) key will generate the best performance on the index that goes with the key and is used when performing searches.
So when the natural key consists of multiple columns, or is very wide, then sometimes it is advisable to create a second alternate, or surrogate key to be used as the target of FK references in other tables. This is generally an internally created value, created within the database, which is not exposed outside the application or system, perhaps not even outside the database component itself.
If that was then the only key, as it is not a meaningful, or natural key, it is totally insufficient to ensure data consistency, as two rows which represent the same entity, and differ in all their attributes only by the meaningless surrogate key, can still be inserted into the table.
Therefore, in such situations, it is a good practice to have both keys on the table.
In addition to often being used to increase performance, surrogate keys have the additional advantedge (Because they are non-meaningful) of not ever having to be changed. Coming up with the best possible natural key, that accurately and uniquely identifies the entity, and will never need to be changed, is an art form and can easily be done poorly. (SSANs are a canonical example) Then, if the real world entity changes any of the values used in a poorly designed natural key, and you are using it as the only key, (and therefore as FKs elsewhere) you will have to change the value everywhere in the database, including in all the other tables where you are using it as a foreign key.
Upvotes: 7
Reputation: 332531
...why is it that we have ID as the primary key instead of Username?
The first rule of data modelling is that you do not expose your keys. This is because:
A natural key is only a key that exists, as-is or as a commonly held convention. For example - the abbreviation of a US state makes for an ideal key because:
An artificial (AKA surrogate) key does not have any relation to the data - ID #1234 could refer to anything.
In theory, use artificial keys as much as possible. Because ID 1234 could mean anything, when you use natural keys there's less need to have to join to know what a given id value means. In reality, because of both performance and that natural keys aren't very common - use artificial keys.
Upvotes: 4
Reputation: 4645
PS: don't forget to add a unique
constraint on the username
column.
Upvotes: 3
Reputation: 50970
Reasons for using an integer non-intelligent primary key in place of a unique text key:
Speed of JOINs and other queries.
Ability to change the text key value and preserve the integrity of the database without have to update every table in which the key is referenced.
Increased code efficiency and reduced memory overhead when your application must build a list of key values in memory.
Decreased size of tables which reference the key.
Upvotes: 9
Reputation: 1906
One key advantage is by having a separate ID you are able to change the username without breaking any of the foreign key relations.
Additionally, it is good RDBMS practice to have primary keys that have no meaning - a username obviously has meaning
Upvotes: 1
Reputation: 46903
This is what is known as a Surrogate Key. It is not duplicating data, it is standing in for data. Mostly it exists simply because it's easier, especially when the natural key is multiple fields, and then you have to use that as a foreign key in another table.
They are common, but there is healthy debate about if they are "correct" or not. Personally, I use them just because it simplifies life. Another bonus is that if you use a natural key and someone typos the key, you end up having to do a cascading update across multiple tables if it's a foreign key. Since it is impossible to typo a surrogate key (it being system generated), this is never an issue.
Upvotes: 2
Reputation: 78105
At least one reason - you can change a Username inexpensively if it's not the primary key.
Upvotes: 1