Aaron Fischer
Aaron Fischer

Reputation: 21231

What should I consider when selecting a data type for my primary key?

When I am creating a new database table, what factors should I take into account for selecting the primary key's data type?

Upvotes: 6

Views: 2961

Answers (16)

rg01
rg01

Reputation:

I usually always use an integer, but here's an interesting perspective.

https://blog.codinghorror.com/primary-keys-ids-versus-guids/

Upvotes: 0

Philippe Grondier
Philippe Grondier

Reputation: 11148

Sorry to do that, but I found that the answers I gave to related questions (you can check this and this) could apply to this one. I reshaped them a little bit...

You will find many posts dealing with this issue, and each choice you'll make has its pros and cons. Arguments for these usually refer to relational database theory and database performance.

On this subject, my point is very simple: surrogate primary keys ALWAYS work, while Natural keys MIGHT NOT ALWAYS work one of these days, and this for multiple reasons: field too short, rules change, etc.

To this point, you've guessed here that I am basically a member of the uniqueIdentifier/surrogate primary key team, and even if I appreciate and understand arguments such as the ones presented here, I am still looking for the case where "natural" key is better than surrogate ...

In addition to this, one of the most important but always forgotten arguments in favor of this basic rule is related to code normalization and productivity:

each time I create a table, shall I lose time

  1. identifying its primary key and its physical characteristics (type, size)
  2. remembering these characteristics each time I want to refer to it in my code?
  3. explaining my PK choice to other developers in the team?

My answer is no to all of these questions:

  1. I have no time to lose trying to identify "the best Natural Primary Key" when the surrogate option gives me a bullet-proof solution.
  2. I do not want to remember that the Primary Key of my Table_whatever is a 10 characters long string when I write the code.
  3. I don't want to lose my time negotiating the Natural Key length: "well if You need 10 why don't you take 12 to be on the safe side?". This "on the safe side" argument really annoys me: If you want to stay on the safe side, it means that you are really not far from the unsafe side! Choose surrogate: it's bullet-proof!

So I've been working for the last five years with a very basic rule: each table (let's call it 'myTable') has its first field called 'id_MyTable' which is of uniqueIdentifier type. Even if this table supports a "many-to-many" relation, where a field combination offers a very acceptable Primary Key, I prefer to create this 'id_myManyToManyTable' field being a uniqueIdentifier, just to stick to the rule, and because, finally, it does not hurt.

The major advantage is that you don't have to care anymore about the use of Primary Key and/or Foreign Key within your code. Once you have the table name, you know the PK name and type. Once you know which links are implemented in your data model, you'll know the name of available foreign keys in the table.

And if you still want to have your "Natural Key" somewhere in your table, I advise you to build it following a standard model such as

Tbl_whatever

   id_whatever, unique identifier, primary key
   code_whatever, whateverTypeYouWant(whateverLengthYouEstimateTheRightOne), indexed
   .....

Where id_ is the prefix for primary key, and code_ is used for "natural" indexed field. Some would argue that the code_ field should be set as unique. This is true, and it can be easily managed either through DDL or external code. Note that many "natural" keys are calculated (invoice numbers), so they are already generated through code

I am not sure that my rule is the best one. But it is a very efficient one! If everyone was applying it, we would for example avoid time lost answering to this kind of question!

Upvotes: 11

Tom
Tom

Reputation: 1381

I usually go with a GUID column primary key for all tables (rowguid in mssql). What could be natural keys I make unique constraints. A typical example would be a produkt identification number that the user have to make up and ensure that is unique. If I need a sequence, like in a invoice i build a table to keep a lastnumber and a stored procedure to ensure serialized access. Or a Sequence in Oracle :-) I hate the "social security number" sample for natural keys as that number will never be alway awailable in a registration process. Resulting in a need for a scheme to generate dummy numbers.

Upvotes: 0

Noah Yetter
Noah Yetter

Reputation: 396

Unless you have an ultra-convenient natural key available, always use a synthetic (a.k.a. surrogate) key of a numeric type. Even if you do have a natural key available, you might want to consider using a synthetic key anyway and placing an additional unique index on your natural key. Consider what happened to higher-ed databases that used social security numbers as PKs when federal law changed, the costs of changing over to synthetic keys were enormous.

Also, I have to disagree with the practice of naming every primary key the same, e.g. "id". This makes queries harder to understand, not easier. Primary keys should be named after the table. For example employee.employee_id, affiliate.affiliate_id, user.user_id, and so on.

Upvotes: 4

Jan Gressmann
Jan Gressmann

Reputation: 5541

I don't really like what they teach in school, that is using a 'natural key' (for example ISBN on a bookdatabase) or even having a primary key made up off 2 or more fields. I would never do that. So here's my little advice:

  • Always have one dedicated column in every table for your primary key.
  • They all should have the same colomn name across all tables, i.e. "ID" or "GUID"
  • Use GUIDs when you can (if you don't need performance), otherwise incrementing INTs

EDIT:
Okay, I think I need to explain my choices a little bit.

  • Having a dedicated column namend the same across all table for you primary key, just makes your SQL-Statements a lot of easier to construct and easier for someone else (who might not be familiar with your database layout) easier to understand. Especially when you're doing lots of JOINS and things like that. You won't need to look up what's the primary key for a specific table, you already know, because it's the same everywhere.

  • GUIDs vs. INTs doesn't really matters that much most of the time. Unless you hit the performance cap of GUIDs or doing database merges, you won't have major issues with one or another. BUT there's a reason I prefer GUIDs. The global uniqueness of GUIDs might always come in handy some day. Maybe you don't see a need for it now, but things like, synchronizing parts of the database to a laptop / cell phone or even finding datarecords without needing to know which table they're in, are great examples of the advantages GUIDs can provide. An Integer only identifies a record within the context of one table, whereas a GUID identifies a record everywhere.

Upvotes: 7

Walter Mitty
Walter Mitty

Reputation: 18950

Use natural keys when they can be trusted. Some sources of natural keys can't be trusted. Years ago, the Social Security Administration used to occasionally mess up an assign the same SSN to two different people. Theyv'e probably fixed that by now.

You can probably trust VINs for vehicles, and ISBNs for books (but not for pamphlets, which may not have an ISBN).

If you use natural keys, the natural key will determine the datatype.

If you can't trust any natural keys, create a synthetic key. I prefer integers for this purpose. Leave enough room for reasonable expansion.

Upvotes: 0

JohnMcG
JohnMcG

Reputation: 8825

Numbers that have meaning in the real world are usually a bad idea, because every so often the real world changes the rules about how those numbers are used, in particular to allow duplicates, and then you've got a real mess on your hands.

Upvotes: 1

MartinHN
MartinHN

Reputation: 19812

A great factor is how much data you're going to store. I work for a web analytics company, and we have LOADS of data. So a GUID primary key on our pageviews table would kill us, due to the size.

A rule of thumb: For high performance, you should be able to store your entire index in memory. Guids could easily break this!

Upvotes: 0

Matthias Meid
Matthias Meid

Reputation: 12521

  • Where do you generate it? Incrementing number's don't fit well for keys generated by the client.
    • Do you want a data-dependent or independent key (sometimes you could use an ID from business data, can't say if this is always useful or not)?
    • How well can this type be indexed by your DB?

I have used uniqueidentifiers (GUIDs) or incrementing integers so far.

Cheers Matthias

Upvotes: 1

Jeffrey L Whitledge
Jeffrey L Whitledge

Reputation: 59553

Do not use a floating point numeric type, since floating point numbers cannot be properly compared for equality.

Upvotes: 2

Kibbee
Kibbee

Reputation: 66162

Whenever possible, try to use a primary key that is a natural key. For instance, if I had a table where I logged one record every day, the logdate would be a good primary key. Otherwise, if there is no natural key, just use int. If you think you will use more than 2 billion rows, use a bigint. Some people like to use GUIDs, which works well, as they are unique, and you will never run out of space. However, they are needlessly long, and hard to type in if you are just doing adhoc queries.

Upvotes: -1

Galwegian
Galwegian

Reputation: 42257

In most cases I use an identity int primary key, unless the scenario requires a lot of replication, in which case I may opt for a GUID.

I (almost) never used meaningful keys.

Upvotes: 6

Pradeep
Pradeep

Reputation: 3276

It all depends.

a) Are you fine having unique sequential numeric numbers as your primary key? If yes, then selecting UniqueIdentifier as your primary key will suffice. b) If your business demand is such that you need to have alpha numeric primary key, then you got to go for varchar or nvarchar.

These are the two options I could think of.

Upvotes: 0

Tony Andrews
Tony Andrews

Reputation: 132750

For a "natural" key, whatever datatype suits the column(s). Artifical (surrogate) keys are usually integers.

Upvotes: 0

Bob Probst
Bob Probst

Reputation: 9651

I'm partial to using an generated integer key. If you expect the database to grow very large, you can go with bigint.

Some people like to use guids. The pro there is that you can merge multiple instances of the database without altering any keys but the con is that performance can be affected.

Upvotes: 0

Ely
Ely

Reputation: 3240

If using a numeric key, make sure the datatype is giong to be large enough to hold the number of rows you might expect the table to grow to.

If using a guid, does the extra space needed to store the guid need to be considered? Will coding against guid PKs be a pain for developers or users of the application.

If using composite keys, are you sure that the combined columns will always be unique?

Upvotes: 7

Related Questions