Yarin
Yarin

Reputation: 183559

What are the reasons *not* to use a GUID for a primary key?

Whenever I design a database I automatically start with an auto-generating GUID primary key for each of my tables (excepting look-up tables)

I know I'll never lose sleep over duplicate keys, merging tables, etc. To me it just makes sense philosophically that any given record should be unique across all domains, and that that uniqueness should be represented in a consistent way from table to table.

I realize it will never be the most performant option, but putting performance aside, I'd like to know if there are philosophical arguments against this practice?

Based on the responses let me clarify:

I'm talking about consistently using a GUID surrogate key as a primary key- irrespective of whether and how any natural or sequential keys are designed on a table. These are my assumptions:

  1. Data integrity based on natural keys can be designed for, but not assumed.
  2. A primary key's function is referential integrity, irrespective of performance, sequencing, or data.

Upvotes: 26

Views: 6413

Answers (8)

Evan Nagle
Evan Nagle

Reputation: 5133

Jeff Atwood talks about this in great detail:
http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html

Guid Pros:
Unique across every table, every database, every server
Allows easy merging of records from different databases
Allows easy distribution of databases across multiple servers
You can generate IDs anywhere, instead of having to roundtrip to the database
Most replication scenarios require GUID columns anyway

Guid Cons:
It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes

Upvotes: 15

supercat
supercat

Reputation: 81159

I wonder why there's no standard "miniGUID" type? It would seem that performing a decent hash on a GUID should yield a 64-bit number which would have a trivial probability of duplication in any universe which doesn't have a billion or more things in it. Since the universe in which most GUID/miniGUID identifiers are used will never grow beyond a million things, much less a billion, I would think a smaller 8-byte miniGuid would be very useful.

That would not, of course, suggest that it should be used as a clustered index; that would greatly impede performance. Nonetheless, an 8-byte miniGUID would only waste a third the space of a full GUID (when compared to a 4-byte index).

Upvotes: 1

user359040
user359040

Reputation:

Simple answer: it's not relational.

The record (as defined by the GUID) may be unique, but none of the associated attributes can be said to be occuring uniquely with that record.

Using a GUID (or any purely surrogate key) is no more relational than declaring a flat file to be relational, on the basis that each record can be identified by its row number.

Upvotes: 3

onedaywhen
onedaywhen

Reputation: 57023

I can see the case for a given application's or enterprise's own identifiers to be unique and be represented in a consistent way across all its own domains (i.e. because they may span more than one database) but a GUID is overkill for these purposes. I guess they are popular because they are available out of the box and designing and implementing an 'enterprise key' takes time and effort. The rule when designing an artifical identifier is to make it as simple as possible but no simpler. IDENTITY is too simple, a GUID isn't simple enough.

Entities that exist outside of the application/enterprise usually have their own identifiers (e.g. a car has a VIN, a book has an ISBN, etc) maintained by an external trusted source and in such cases the GUID adds nothing. So I guess the philosphical argument against I'm getting at here is that using a artifical identifier on every table is unnecessary.

Upvotes: 0

nvogel
nvogel

Reputation: 25526

You still implement the natural key of each table as well don't you? - GUID keys alone obviously won't prevent duplicate data, redundancy and consequent loss of data integrity.

Assuming that you do enforce other keys then adding GUIDs to every table without exception is probably just adding unnecessary complexity and overhead. It doesn't really make it easier to merge data in different tables because you still have to modify/de-duplicate the other key(s) of the table anyway. I suggest you should evaluate the use of a GUID surrogate on a case-by-case basis. Having a blanket rule for every table isn't necessary or helpful because every table models a different thing after all.

Upvotes: 4

Coxy
Coxy

Reputation: 8928

A potentially big reason, but one often not thought of, is if you might have to provide compatibility with an Oracle database in the future.

Since Oracle doesn't have a uniqueid column data type, it can lead to a bit of a nightmare when you have two different data types for the same primary key across two different databases, especially when an ORM is involved.

Upvotes: 1

marc_s
marc_s

Reputation: 754488

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table.

What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to. The main reason for this is indeed performance, which will come and bite you down the road... (it will, trust me - just a matter of time) - plus also a waste of resources (disk space and RAM in your SQL Server machine) which is really not necessary.

You really need to keep two issues apart:

1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.

2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so.

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.

Quick calculation - using INT vs. GUID as Primary and Clustering Key:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.

Marc

Upvotes: 18

Thomas
Thomas

Reputation: 64645

Adding to ewwwn:

Pros

  • It makes it nearly impossible for developers to "accidentally" expose the surrogate key to users (unlike integers where it happens almost all the time).
  • Makes merging databases several orders of magnitude simpler than dealing with identity columns.

Cons

  • Fatter. The real problem with it being fatter is that it eats up more space per page and more space in your indexes making them slower. The additional storage space of Guids is frankly irrelevant in today's world.
  • You absolutely must be careful about how new values are created. Truly random values do not index well. You are compelled to use a COMB guid or some variant that adds a sequential element to the guid.

Upvotes: 4

Related Questions