AbrahamJP
AbrahamJP

Reputation: 3430

Should primary keys be always assigned as clustered index

I have a SQLServer table that stores employee details, the column ID is of GUID type while the column EmployeeNumber of INT type. Most of the time I will be dealing with EmployeeNumber while doing joins and select criteria's.

My question is, whether is it sensible to assign PrimaryKey to ID column while ClusteredIndex to EmployeeNumber?

Upvotes: 11

Views: 11035

Answers (6)

Marco Guignard
Marco Guignard

Reputation: 653

Using a clustured index on something else than the primary key will improve performance on SELECT query which will take advantage of this index.

But you will loose performance on UPDATE query, because in most scenario, they rely on the primary key to found the specific row you want to update.

CREATE query could also loose performance because when you add a new row in the middle of the index a lot of row have to be moved (physically). This won't happen on a primary key with an increment as new record will always be added in the end and won't make move any other row.

If you don't know what kind of operation need the most performance, I recommend to leave the clustered Index on the primary key and use nonclustered index on common search criteria.

Upvotes: 1

JNK
JNK

Reputation: 65147

The ideal clustered index key is:

  1. Sequential
  2. Selective (no dupes, unique for each record)
  3. Narrow
  4. Used in Queries

In general it is a very bad idea to use a GUID as a clustered index key, since it leads to mucho fragmentation as rows are added.

EDIT FOR CLARITY:

PK and Clustered key are indeed separate concepts. Your PK does not need to be your clustered index key.

In practical applications in my own experience, the same field that is your PK should/would be your clustered key since it meets the same criteria listed above.

Upvotes: 11

IamIC
IamIC

Reputation: 18239

Since EmployeeNumber is unique, I would make it the PK. In SQL Server, a PK is often a clustered index.

Joins on GUIDs is just horrible. @JNK answers this well.

Upvotes: 0

Remus Rusanu
Remus Rusanu

Reputation: 294217

Yes, it is possible to have a non-clustered primary key, and it is possible to have a clustered key that is completely unrelated to the primary key. By default a primary keys gets to be the clustered index key too, but this is not a requirement.

The primary key is a logical concept: is the key used in your data model to reference entities.
The clustered index key is a physical concept: is the order in which you want the rows to be stored on disk.

Choosing a different clustered key is driven by a variety of factors, like key width when you desire a narrower clustered key than the primary key (because the clustered key gets replicated in every non-clustered index. Or support for frequent range scans (common in time series) when the data is frequently accessed with queries like date between '20100101' and '20100201' (a clustered index key on date would be appropriate).

This subject has been discussed here ad nauseam before, see also What column should the clustered index be put on?.

Upvotes: 12

Conrad Frix
Conrad Frix

Reputation: 52645

Clustered indexes cause the data to be physically stored in that order. For this reason when testing for ranges of consecutive rows, clustered indexes help a lot.

GUID's are really bad clustered indexes since their order is not in a sensible pattern to order on. Int Identity columns aren't much better unless order of entry helps (e.g. most recent hires)

Since you're probably not looking for ranges of employees it probably doesn't matter much which is the Clustered index, unless you can segment blocks of employees that you often aren't interested in (e.g. Termination Dates)

Upvotes: 0

Daniel Pratt
Daniel Pratt

Reputation: 12077

First, I have to say that I have misgivings about the choice of a GUID as the primary key for this table. I am of the opinion that EmployeeNumber would probably be a better choice, and something naturally unique about the employee would be better than that, such as an SSN (or ATIN), which employers must legally obtain anyway (at least in the US).

Putting that aside, you should never base a clustered index on a GUID column. The clustered index specifies the physical order of rows in the table. Since GUID values are (in theory) completely random, every new row will fall at a random location. This is very bad for performance. There is something called 'sequential' GUIDs, but I would consider this a bit of a hack.

Upvotes: 2

Related Questions