Callum Evans
Callum Evans

Reputation: 361

SQL Server Clustered Index on Non-Unique Column

I've been trying to read up on the use of clustered indexes as a means of improving query performance.

Essentially, I have a 'Messages' table, which is used in a chat application. When the user opens a chat, we read the message history from the table based on the customer the user is speaking to.

The messages table is structured with the columns:

Id : CustomerId : Content

The query most often used on the table will be something like

SELECT * FROM Message WHERE CustomerId = @CustomerId

My question is, is the CustomerId column a suitable candidate for a clustered index? Also, given that the ID field is the primary key, will SQL Server still need to 'uniqueify' the clustered index?

The table will be heavy on both selects and inserts.

Upvotes: 2

Views: 4517

Answers (2)

Remus Rusanu
Remus Rusanu

Reputation: 294227

Adding a clustered index on a key on which there is already a primary key constrain is an unnecessary duplication. Instead the primary key constrain should also be the clustered index. However, your question is actually different...

is the CustomerId column a suitable candidate for a clustered index?

One cannot answer this without knowing how you will be querying the table. There are numerous query patterns for which this organization will not be optimal (typical examples being time series where the time column is the appropriate clustered key). And there are just as many examples of queries for which this is the best clustered index. Unanswerable.

given that the ID field is the primary key, will SQL Server still need to 'uniqueify' the clustered index?

If the index is not declared unique then SQL Server will add the uniquifier column. The column value will never materialize though, as no duplicates will ever occur.

I have a 'Messages' table, which is used in a chat application ... WHERE CustomerId = @Id ... the ID field is the primary key

Pardon me, but this does not make any sense. What you're saying is that the Messages table can have only one message from each customer. That would make a horrible chat experience. I'm pretty sure your explanation is wrong.

I would expect a Customers table with clustered index and primary key constraint on CustomerId. The Messages table is likely to be organized by chat room, or some other group organization that pairs the chat participants. If the chats are always between one and exactly one Customer and a representative, then the 'chat room' may be the customer itself. In any way, the typical querying of such a Messages table would want all the Messages exchanged in a chat room, in the order posted or all the Messages exchanged with a customer, in the order posted. This is, actually, a partitioned time series and best served by a clustered index like (chat_id, post_time) or (customer_id, post_time). Note that this is not the primary key, the table may well have a message_id as primary key, but non-clustered.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269583

The clustered index does not need to be unique, so it is possible.

However, the issue is that each time a new message is inserted, SQL Server needs to find a space for the new row next to the other rows for the same customer. This can often be inefficient, because pages need to be split, resulting in many half-filled pages. And, things get even more complicated if you have deletes on the rows as well.

There are several options. In a busy database, you can leave room on the pages for additional inserts. Or, another option is to partition the table based on the customer id. It all depends.

Under most circumstances, an identity column on the messages table would be the primary key and the clustered key as well. An additional index on the customer table would be sufficient. But, there are definitely alternative structures that can work better in some scenarios.

Upvotes: 2

Related Questions