Reputation:
I have a user table where there are hundreds of millions of rows and there's a field username(varchar), should I make it a primary key instead of an unique index? what are the advantages or disadvantages of adding an extra field user_id(int) and make that a primary key instead? I don't see where I would use user_id except say on join condition where a join on int would be faster than join on varchar? or is it?(since both fields are indexed)
update: assume changing username is not an option.
Upvotes: 3
Views: 632
Reputation: 29
I would add another column to the existing key. Adding an additional field to the existing primary key is also known as a concatenated primary key.
https://www.relationaldbdesign.com/database-analysis/module2/concatenated-primary-keys.php
Upvotes: 0
Reputation: 185593
First of all, I second Frederik's comment: I am a firm believer in not ascribing any business or functional value to the primary key of a table. There may not be the option of changing the username now, but perhaps there will be later. Even if not, it's better to get into the habit and be consistent with all of your tables rather than mixing paradigms.
A secondary reason for using a numeric (or sequential in some way) primary key is insertion and update speed. While this can be changed, by default the primary key on a table is also the clustered index. The clustered index determines the physical order of the rows in the table, so inserting a value out of order causes the database engine to shift all of the rows after it down so that it can insert it in the proper location. With a table with millions of rows, that can be a non-trivial insert or update operation.
Upvotes: 3
Reputation: 3295
The reason I prefer a numeric PK is so I can easily allow the username to be changed.
If the username is also the primary key, it means that all records that relate to that user must also be changed when the username is changed.
Note that your database can generate the proper ID for the numeric PK through a number of means. On MySQL it's adding an "auto_increment" attribute to the field, on Postgres and Oracle it's through sequences.
If you have hundreds of millions of rows though, you are correct that you may be better using the username. I try to avoid having variant PK's floating around between tables, it just makes the whole thing harder to maintain for those who follow me into the code unless it's absolutely necessary.
Upvotes: 2
Reputation: 56934
I would prefer adding an extra field as the primary key.
The main reason, is that -imho- primary keys should have no 'business' value. A primary key is merely an adminstrative item, which is only important to the database so that integrity can be garantueed.
As Brian already mentionned, by adding a surrogate primary key, you can -in your case- allow that a user changes his username without problems.
The value of a primary key should never be changed: otherwise an update can become very expensive, when you have lots of foreign keys. All those changes should be cascaded to the related tables.
Next to that, an integer is for instance 4 bytes, while your usename column is much larger.
This does not only mean that you'll be taking up much more space in related tables, but it also means that your index will grow larger.
The buckets that make up your index will contain less 'record pointers', which means you'll have more buckets, which means your index will be slower.
Upvotes: 3