Reputation: 21

Identity field and primary key in SQL Server when values are unique

When a set of values that will be stored in a table have a name or a code that should be unique across the system, should it be created with a primary key of ID auto increment (int)?

Take the situation of State Abbreviations. Other than consistency, what would be the purpose of an ID on the table that was the primary key other than the state name or abbreviation?

If for example the foreign key from an shipping address referenced the state abbreviation that is not mutable then ... is there a purpose for having an auto increment int ID?

Upvotes: 2

Answers (3)

Szymon

Reputation: 43023

As a general rule (which may not apply in every single case), it's better to use integers as primary keys for performance reasons. So if your unique key is a string, create an autoincrement primary key.

Also, states don't have to be necessarily unique. It's true in one country but when you look at all countries in the world, same abbreviations may happen.

EDIT

I can't find a very good evidence of string vs. integer performance but take a look e.g. in here: Strings as Primary Keys in SQL Database

Having said that, there's never a lot of states so performance gain will be small in this case.

Upvotes: 2

Aaron Bertrand

Reputation: 280644

You highlighted one positive aspect of a separate table: consistency. It is much easier to have this:

CREATE TABLE dbo.States
(
  StateID TINYINT PRIMARY KEY,
  Name VARCHAR(32),
  Abbreviation CHAR(2)
);

CREATE TABLE dbo.CustomerAddresses
(
  AddressID INT PRIMARY KEY,
  ...,
  StateID TINYINT NOT NULL FOREIGN KEY REFERENCES dbo.States(StateID)
);

Than to have a trigger or check constraint like:

CHECK StateAbbreviation IN ('AL', 'AK', /* 50+ more states/territories... */)

Now, with something static and small like a 2-character state abbreviation, this design might make more sense, eliminating some unnecessary mapping between the abbreviations and some surrogate ID:

CREATE TABLE dbo.States
(
  Abbreviation CHAR(2) PRIMARY KEY,
  Name VARCHAR(32)
);

CREATE TABLE dbo.CustomerAddresses
(
  AddressID INT PRIMARY KEY,
  ...,
  StateAbbreviation CHAR(2) FOREIGN KEY REFERENCES dbo.States(Abbreviation)
);

This constrains the data to the known set of states, allows you to store the actual data in the table (which can eliminate a lot of joins in queries), actually saves you some space, and avoids having any messy hard-coded check constraints (or constraints using UDFs, or triggers validating the data).

That all said, there is no magic blanket answer that satisfies all designs. As your string gets larger, it can make more sense to use an integer instead of just storing the string. A counter-example would be storing all of the User Agent strings from your web logs - it makes a lot of sense to store the same string once and assign an integer to it, than to store the same 255-character string over and over and over again.

Other things that can make this design troublesome:

What if you expand beyond the US later?
Forget about state abbreviations for a moment (which are pretty static); what if your lookups are things that do change frequently?

Upvotes: 3

D Stanley

Reputation: 152654

State Abbreviation is a rare example of a good non-increment primary key for the following reasons:

They are small (2-character)
They don't change
The set of values is relatively static - new records are unlikely

Just because the natural key is unique doesn't make it a good candidate for the primary key.

Even real-world values that are unique (like SSN) may nod be good candidates if they are entered in by humans. For example, suppose someone enters in a bunch of related data for a person, then get a letter that the SSN is wrong - now you can't just update the primary key - you need to update all of the foreign keys as well!

Upvotes: 2

Identity field and primary key in SQL Server when values are unique

Answers (3)

Related Questions