Andreas Zita
Andreas Zita

Reputation: 7560

Clustered index on foreign key or primary key?

I have a table Item with autoinc int primary key Id and a foreign key UserId.

And I have a table User with autoinc int primary key Id.

Default is that the index for Item.Id gets clustered.

I will mostly query items on user-id so my question is: Would it be better to set the UserId foreign key index to be clustered instead?

Upvotes: 2

Views: 1536

Answers (5)

Stan
Stan

Reputation: 1999

The answer depends only on usage scenario. For example, Guffa tolds that data will be fragmented. That's wrong. If your queries depends mostly on UserId, then data clustered by ItemId is fragmented for you, because items for same user may be spreaded over a lot of pages.

Of course, compared to sequential ItemId (if it is sequential in your schema), using UserId as clustered key can cause page splits while inserting. This is two additional page writes at maximum. But when you're selecting by some user, his items may be fragmented over tens of pages (depends on items per user, item size, insertion strategy, etc) and therefor a lot of page reads. If you have a lof of such selects per single single insert (very often used web/olap scenarios), you can face hundreds of IO operations compared to few ones spent on page splitting. That was the clustering index was created for, not only for clustering by surrogate IDs.

So there is no clear answer, are the clustered UserId in your case good or bad, because this is highly depends on context. What is ratio between selects/inserts operations? How fragmented user ids are if clustered by itemid? How many additional indicies are on the table, because there is a pitfall (below) in sql server.

As you might know, clustered index requires unique values. This is not a big problem, because you can create index on pair (UserId, ItemId). Clustered index isn't itself stored on disk, so no matter how many fields are there. But non-clustered indices store clustered index values in their leaves. So if you have clustered index on UserId+ItemId (lets imagine their type is [int] and size is 8 bytes) and non-clustered index on ItemId, then this index will have twice size (8 bytes per a b-tree leaf) compared to just the ItemId as clustered index (4 bytes per a leaf).

Upvotes: 2

Guffa
Guffa

Reputation: 700212

Having the clustered index on the identity field has the advantage that the records will be stored in the order that they are created. New records are added at the end of the table.

If you use the foreign key as clustered index, the records will be stored in that order instead. When you create new records the data will be fragmented as records are inserted in the middle, which can reduce performance.

If you want an index on the foreign key, then just add a non-clustered index for it.

Upvotes: 3

Nicholas Carey
Nicholas Carey

Reputation: 74197

In general, you want to cluster on the most frequently accessed index. But you're not required to have a clustering index at all. You (or your DBAs) need to evaluate things and weigh the advantages and disadvantages so as to choose the most appropriate indexing strategy.

If you cluster on a monotonic counter like an identity column, all new rows are going to be inserted at the end of the table: that means a "hot spot" is created that is likely to cause lock contention on inserts, since every SPID doing an insert is hitting the same data page.

Tables without a clustering index have their data pages organized as a heap, pretty much just a linked list of data pages.

SQL Server indexes are B-trees. For non-clustered indexes, the leaf nodes of the B-tree are pointers to the appropriate data page. That means if the index is used and doesn't cover the query's columns, an additional look aside has to be done to fetch the data page. That means additional I/O and paging.

Clustered indices are different: their leaf nodes are the data pages themselves, meaning the heap essentially goes away: a table scan means a traversal of the clustering index's B-tree. The advantage is that once you've found what you need in the clustered index, you already have the data page you need, thus avoiding the additional I/O that a seek on a non-clustered index is likely to requir. The disadvantage, of course, is that the clustered index is larger, since it carrys the entire table with it, so traversals of the clustered index are more expensive.

Upvotes: 1

JohnLBevan
JohnLBevan

Reputation: 24410

Possibly.

Is the item.user-id column a unique column within your item table? If not you'd need to make this a clustered primary key by adding a second (possibly more) column to the key to make it unique / possibly this will add additional overhead that you'd not anticipated.

Are there any relationships with the item.id column? If so those may be important to the performance of your application so should be taken into account.

How often is the item.user-id value likely to change? If not at all that counts in its favour; the more often it's likely to be updated the worse, since that leads to fragmentation.

My recommendation would be to build you app with the regular item.id as clustered key, the later once you've got some data try (in a test system using a copy of your production data) switching the clustered index and testing its impact; that way you can easily see real results rather than trying to guess the multitude of possibilities. This avoids premature optimisation / ensures you make the correct choice.

Upvotes: 0

Avneesh
Avneesh

Reputation: 654

clustered index is created on primary key so what you can do is leave that as clustered and then create a non clustered index on the user Id of item. This will still be very fast as user. Id column will be clustered index.

Upvotes: 0

Related Questions