Reputation: 46591
Lately, I have been doing some reading on indexes of all types and the main advice is to put the clustered index on the primary key of the table, but what if the primary key actually is not used in a query (via a select or join) and is just put for purely relational purposes, so in this case it is not queried against. Example, say I have a car_parts
table and it contains 3 columns, car_part_id
, car_part_no
, and car_part_title
. car_part_id
is the unique primary key identity column. In this case car_part_no
is unique as well and is most likely car_part_title
. car_part_no
is what is most queried against, so doesn't it make sense to put the clustered index on that column instead of car_part_id
? The basics of the question is what column should actually have the clustered index since you are only allowed one of them?
Upvotes: 13
Views: 7616
Reputation: 754348
Kimberly Tripp is always one of the best sources on insights on indexing.
See her blog post "Ever-increasing clustering key - the Clustered Index Debate - again!" in which she quite clearly lists and explains the main requirements for a good clustering key - it needs to be:
and best of all, if you can manage:
Taking all this into account, an INT IDENTITY
(or BIGINT IDENTITY
if you really need more than 2 billion rows) works out to be the best choice in the vast majority of cases.
One thing a lot of people don't realize (and thus don't take into account when making their choice) is the fact that the clustering key (all the columns that make up the clustered index) will be added to each and every index entry for each and every non-clustered index on your table - thus the "narrow" requirement becomes extra important!
Also, since the clustering key is used for bookmark lookups (looking up the actual data row when a row is found in a non-clustered index), the "unique" requirement also becomes very important. So important in fact, that if you choose a (set of) column(s) that is/are not guaranteed to be unique, SQL Server will add a 4-byte uniquefier to each row --> thus making each and every of your clustered index keys extra wide ; definitely NOT a good thing.
Marc
Upvotes: 5
Reputation: 294227
An index, clustered or non clustred, can be used by the query optimizer if and only if the leftmost key in the index is filtered on. So if you define an index on columns (A, B, C), a WHERE condition on B=@b
, on C=@c
or on B=@b AND C=@c
will not fully leverage the index (see note). This applies also to join conditions. Any WHERE filter that includes A
will consider the index: A=@a
or A=@a AND B=@b
or A=@a AND C=@c
or A=@a AND B=@b AND C=@c
.
So in your example if you make the clustred index on part_no
as the leftmost key, then a query looking for a specific part_id
will not use the index and a separate non-clustered index must exist on part-id
.
Now about the question which of the many indexes should be the clustered one. If you have several query patterns that are about the same importance and frequency and contradict each other on terms of the keys needed (eg. frequent queries by either part_no
or part_id
) then you take other factors into consideration:
Note: not fully leverage as sometimes the engine will choose an non-clustered index to scan instead of the clustered index simply because is narrower and thus has fewer pages to scan. In my example if you have an index on (A, B, C) and a WHERE filter on B=@b
and the query projects C
, the index will be likely used but not as a seek, as a scan, because is still faster than a full clustered scan (fewer pages).
Upvotes: 9
Reputation: 32575
Keep in mind the usage patterns; If you are almost always querying the DB on the car_part_no, then it would probably be beneficial for it to be clustered on that column.
However, don't forget about joins; If you are most often joining to the table and the join uses the car_part_id field, then you have a good reason to keep the cluster on car_part_id.
Something else to keep in mind (less so in this case, but generally when considering clustered indexes) is that the clustered index will appear implicitly in every other index on the table; So for example, if you were to index car_part_title, that index will also include the car_part_id implicitly. This can affect whether or not an index covers a query and also affects how much disk space the index will take (which affects memory usage, etc).
Upvotes: 2
Reputation: 116977
The clustered index should go on the column that will be the most queried. This includes joins, as a join must access the table just like a direct query, and find the rows indicated.
You can always rebuild your indexes later on if your application changes and you find you need to optimize a table with a different index structure.
Some additional guidelines for deciding on what to cluster your table on can be found on MSDN here: Clustered Index Design Guidelines.
Upvotes: 1
Reputation: 10610
Clustered indexes are good when you query ranges of data. For example
SELECT * FROM theTable WHERE age BETWEEN 10 AND 20
The clustered index arranges rows in the particular order on your computer disk. That's why rows with age = 10 will be next to each other, and after them there will be rows with age = 11, etc.
If you have exact select, like this:
SELECT * FROM theTable WHERE age = 20
the non-clustered index is also good. It doesn't rearrange data on your computer disk, but it builds special tree with a pointers to the rows you need.
So it strongly depends on the type of queries you perform.
Upvotes: 4