Reputation: 361
I've been trying to read up on the use of clustered indexes as a means of improving query performance.
Essentially, I have a 'Messages' table, which is used in a chat application. When the user opens a chat, we read the message history from the table based on the customer the user is speaking to.
The messages table is structured with the columns:
Id : CustomerId : Content
The query most often used on the table will be something like
SELECT * FROM Message WHERE CustomerId = @CustomerId
My question is, is the CustomerId column a suitable candidate for a clustered index? Also, given that the ID field is the primary key, will SQL Server still need to 'uniqueify' the clustered index?
The table will be heavy on both selects and inserts.
Upvotes: 2
Views: 4517
Reputation: 294227
Adding a clustered index on a key on which there is already a primary key constrain is an unnecessary duplication. Instead the primary key constrain should also be the clustered index. However, your question is actually different...
is the CustomerId column a suitable candidate for a clustered index?
One cannot answer this without knowing how you will be querying the table. There are numerous query patterns for which this organization will not be optimal (typical examples being time series where the time column is the appropriate clustered key). And there are just as many examples of queries for which this is the best clustered index. Unanswerable.
given that the ID field is the primary key, will SQL Server still need to 'uniqueify' the clustered index?
If the index is not declared unique then SQL Server will add the uniquifier column. The column value will never materialize though, as no duplicates will ever occur.
I have a 'Messages' table, which is used in a chat application ...
WHERE CustomerId = @Id
... the ID field is the primary key
Pardon me, but this does not make any sense. What you're saying is that the Messages
table can have only one message from each customer. That would make a horrible chat experience. I'm pretty sure your explanation is wrong.
I would expect a Customers
table with clustered index and primary key constraint on CustomerId
. The Messages
table is likely to be organized by chat room, or some other group organization that pairs the chat participants. If the chats are always between one and exactly one Customer and a representative, then the 'chat room' may be the customer itself. In any way, the typical querying of such a Messages
table would want all the Messages exchanged in a chat room, in the order posted or all the Messages exchanged with a customer, in the order posted. This is, actually, a partitioned time series and best served by a clustered index like (chat_id, post_time)
or (customer_id, post_time)
. Note that this is not the primary key, the table may well have a message_id
as primary key, but non-clustered.
Upvotes: 0
Reputation: 1269583
The clustered index does not need to be unique, so it is possible.
However, the issue is that each time a new message is inserted, SQL Server needs to find a space for the new row next to the other rows for the same customer. This can often be inefficient, because pages need to be split, resulting in many half-filled pages. And, things get even more complicated if you have deletes on the rows as well.
There are several options. In a busy database, you can leave room on the pages for additional inserts. Or, another option is to partition the table based on the customer id. It all depends.
Under most circumstances, an identity column on the messages table would be the primary key and the clustered key as well. An additional index on the customer table would be sufficient. But, there are definitely alternative structures that can work better in some scenarios.
Upvotes: 2