Reputation: 30636
Via this link, I know that a GUID is not good as a clustered index, but it can be uniquely created anywhere. It is required for some advanced SQL Server features like replication, etc.
Is it considered bad design if I want to have a GUID column as a typical Primary Key ? Also this assumes a separate int identity column for my clustering ID, and as an added bonus a "user friendly" id?
update
After viewing your feedback, I realise I didn't really word my question right. I understand that a Guid makes a good (even if its overkill) PK, but a bad clustering index (in general). My question more directly asked, is, is it bad to add a second "int identity" column to act as the clustering index?
I was thinking that the Guid would be the PK and use it to build all relationships/joins etc. Then I would instead of using a natural key for the Cluster Index, I would add an additional "ID" that not data-specific. What I'm wondering is that bad?
Upvotes: 6
Views: 2765
Reputation: 27294
It is not bad design at all, an int Identity for your clustering key gives you a number of good benefits (Narrow,Unique,Ascending) whilst keeping the GUID for functionality purposes very separate and acting as your primary key.
If anything I would suggest you have the right approach, although the "user friendly" ID is the most questionable part - as in what purpose is it there to serve.
Addendum : I should put in the obligatory link to (possibly?) the most read article about the topic by Kimberley Tripp. http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx
Upvotes: 0
Reputation: 20485
Using a GUID is lazy -- i.e., the DBA can't be bothered to model his data properly. Also it offers very bad join performance -- typically (16-byte type with poor locality).
Is it a bad design, if I want to have a GUID column as my typical Primary Key, and a separate, int identity column for my clustering ID, and as an added bonus a "user friendly" id?
Yes it is very bad -- firstly you don't want more than one "artificial" candidate key for your table. Secondly, if you want a user friendly id to use as keys just use a fixed length type such as char[8] or binary(8) -- preferably binary as the sort won't use the locale; you could use 16-byte types however you will notice a deterioration in performance -- however not as bad as GUID's. You can use these fixed types to build your own user-friendly allocation scheme that preserves some locality but generates sensible and meaningful id's.
As an Example:
If you are writing some sort of a CRM system (lets say online insurance quotes) and you want an extremely user friendly type for example a insurance quote reference (QR) that looks like so "AD CAR MT 122299432".
In this case -- since the quote length huge -- I would create a separate LUT/Symboltable to resolve the quote reference to the actual identifier used. but I will divorce this LUT from the rest of the model, I will never use the quote reference anywhere else in the model, especially not in the table representing the QR's.
Create Table QRLut
{
bigint bigint_id;
char(32) QR;
}
Now if my model has one table that represents the QR and 20 other tables featuring the bigint QR as a foreign key -- the fact that a bigint is used will allow my DB to scale well -- the wider the join predicates the more contention is caused on the memory bus -- and the amount of contention on the memory bus determines how well your CPU's can be saturated (multiple CPU's).
You might think with this example that you could just place the user-friendly QR in the table that actually represents the quote, however keep in mind that SQL server gathers statistics on tables and indices, and you don't want to let the server make caching decisions based on the user-friendly QR -- since it is huge and wastefull.
Upvotes: 1
Reputation: 65435
You are right that GUIDs make good object identifiers, which are implemented in a database as primary keys. Additionally, you are right that primary keys do not need to be the clustered indices.
GUIDs share the same characteristics for clustered indexes as INT IDENTITY
columns, provided that the GUIDs are sequential. There is a NewSequentialID
specific to SQL Server, but there is also a generic algorithm for creating them called COMB GUID, based on combining the current datetime with random bytes in a way that retains a large degree of randomness while retaining sequentiality.
One thing to keep in mind, if you intend to use NHibernate at some point, is that NHibernate natively knows how to use the COMB GUID strategy - and NHibernate can even use it to do batch-inserts, something that cannot be done with INT IDENTITY
or NewSequentialID
. If you are inserting multiple objects with NHibernate, then it will be faster to use the COMB GUID strategy than either of the other two methods.
Upvotes: 0
Reputation: 56113
Consider using only GUID, but get your GUIDs using the NEWSEQUENTIALID
method (which allocates sequential values and so doesn't have the same clustering performance problems as the NEWID
method).
A problem with using a secondary INT key as an index is that, if it's an adequate index, why use a GUID at all? If a GUID is necessary, how can you use an INT index instead? I'm not sure whether you need a GUID, and if so then why: are you doing replication and/or merging between multiple databases? And if you do need a GUID then you haven't specified exactly how you intend to use the non-globally-unique INT index in that scenario.
Sounds like what you are saying is that I have not made a good case for using a Guid at all, and I agree I know its overkill, but my question I guess would be is it too much overkill?
I think it's convenient to use GUID instead of INT for the primary key, if you have a use case for doing so (e.g. multiple databases) and if you can tolerate the linear, O(1) loss of performance caused simply by using a bigger (16-byte) key (which results in there being fewer index instances per page of memory).
The bigger worry is the way in which using a (random) GUID could affect performance when it's used for clustering. To counter-act that:
Either, use something else (e.g. one of the record's natural keys) as the clustered index, even if you still use a GUID for the primary key
Or, let the clustered index be the same field as the GUID primary key, but use NewSequentialId() instead of NewId() to allocate the GUID values.
is it bad to insert an additional artifical "id" for clustering, since I'm not sure I'll have a good natural ID candidate for clustering?
I don't understand why you wouldn't prefer to instead use just the GUID with NewSequentialId(), which is I think is provided for exactly this reason.
Upvotes: 1
Reputation: 96572
If you are going to create the identity field anyway, use that as the primary key. Think about querying this data. Ints are faster for joins and much easier to specify when writing queries.
Use the GUID if you must for replication, but don't use it as a primary key.
Upvotes: 2
Reputation: 532465
What are you intending to accomplish with the GUID? The int identity
column will also be unique within that table. Do you actually need or expect to need the ability to replicate? If so, is using a GUID actually preferable in your architecture over handling identity columns through one of the identity range mangement options?
If you like the "pretty" ids generated using the Active Record pattern, then I think I'd try to use it instead of GUIDs. If you do need replication, then use one of the replication strategies appropriate for identity columns.
Upvotes: 1
Reputation: 50225
Personally, I would go this way:
An internally known identity field for your PK (one that isn't known to the end-user because they will inevitably want to control it somehow).
A user-friendly "ID" that is unique with respect to some business rule (enforced either in your app code or as a constraint).
A GUID in the future if it's ever deemed necessary (like if it's required for replication).
Now with respect to the clustered index, which you may or may not be confused about, consider this guide from MS for SQL Server 2000.
Upvotes: 0
Reputation: 8259
I think it is bad design to do it that way but I don't know if it is bad otherwise. Remember, SQLServer automatically assigns the clustered index to the Primary key. You would have to remove it after making the GUID the primary key. Also, you usually want your identity column to be your primary key. So doing what you are saying would confuse anyone who reads your code that doesn't look closely. I would suggest you make the ID column your primary key, identity column, and put the clustered index on it. Then make your GUID column a unique key, making it a non-clustered index and not allowing nulls. That in affect will do what you want but will follow more of the standard.
Upvotes: 0