Reputation: 3089

Need advice in choosing right columns for clustered index in SQL Server 2012 (among list of given options)

I am using SQL Server 2012.

So here is the table

GroupCategoryType 
(
  GroupCategoryTypeID int identity(1,1) not null, --It is also a PK
  GroupTypeID int,
  id int,
  GroupCategoryID int,
  UserID int
)  
-- Total Rows around 60,000

I'm setting up a custom clustered index on this table as existing clustered index on GroupCategoryTypeID key is not much help (as no filtering/join is being done over this PK).

I have chosen (GroupCategoryID, GroupTypeID, ID) as these three are the most frequently used in ('join','where')
And the order I've mentioned here is as per their decreasing order of frequency of usage.

Few points to note here about this (to be) clustered index are :

GroupTypeID can be either 1 or 2; it has least frequency of unique values i.e 2
GroupCategoryId has moderate frequency of unique values i.e 2200 # of unique values ID has the highest frequency of unique values i.e 30,000

None of the possible combinations of pairs or triplet of these columns is unique.
Here are the few stats :

group by GroupCategoryId,id returns count(*) from 2 to 1600 for 700 rows out of total 55,000 and remaining 54,300 are unique.
To make it more clear, I used select groupCategoryid,id,count(*) as total from GroupCategoryType group by groupCategoryid,id order by total desc, and this returned 55000 rows.

group by GroupCategoryId,GroupTypeID returns count(*) ranging from 2 to 3200 for 700 rows out of total 2500 and remaining 1800 are unique.

group by GroupTypeID,ID returns count(*) ranging from 2 to 1600 for 6500 rows out of total 42000 and remaining 40,400 are unique.

group by GroupCategoryid,grouptypeid,id (triplet) returns count(*) ranging from 2 to 1600 for 700 rows out of total 52000 and remaining 51,300 are unique.

Now, my questions are :

What should be the best clustered index as per all these stats? (Keeping in mind that GroupTypeid has just two unique values)
I know SQL Server will itself add a uniqueifier to make rows unique, but will it add this key to all the rows (even to those that are already unique) or just the duplicate ones ?
Considering Point #2 above, Should I add GroupCategoryTypeID to the clustered index to make all combinations unique?
If for example there is a query that uses just (col2,col3) or (col1,col3) or just (col3) in where clause, will the clustered index of (col1,col2,col3) still be used? If not, is should creating a non-clustered index on (col2,col3) or (col1,col3) and separately for (col3) the best solution for such cases?

Upvotes: 1

Answers (2)

Maurício Pontalti Neri

Reputation: 1080

What should be the best-clustered index as per all these stats? (Keeping in mind that GroupTypeid has just two unique values)

The best way to decide is analyzing the execution plan, On the top of my head, my choice for these key is (GroupTypeID, GroupCategoryID, ID). With this configuration, the head can read more linear avoid jumps just because are only two registers

I know SQL Server will itself add a uniqueifier to make rows unique, but will it add this key to all the rows (even to those that are already unique) or just the duplicate ones?

The unique-identifiers inserted by SQL Server are used to control transactions they are not to use as mundane primary-key.

Considering Point #2 above, Should I add GroupCategoryTypeID to the clustered index to make all combinations unique?

Not necessarily, your clustered index must be the best for you use is not always the PK, in some cases, you can have a clustered index not unique. It is not usual but is possible.

If for ex. there is a query, that uses just (col2,col3) or (col1,col3) or just (col3) in where clause, will the clustered index of (col1,col2,col3) still be used ? If Not, then should creating a non-clustered index on (col2,col3) or (col1,col3) and separately for (col3), a best solution for such cases?

Again, you need to check the execution plan, in some environments, a table with 60k register is too small and SQL Server, in others, this table is so big that you need a lot of indexes.

Upvotes: 1

Alen

Reputation: 192

A few times I've had success creating clustered indexes against the best practice of int/bigint and unique values. In one case I suggested creating a clustered index on a bigint column that was not even close to unique and it solved a big blocking problem we had.

The table in question, the most intensive queries would select millions of rows in some instances using that column as a filter. At first we had the clustered index on another bigint column that was unique in the table and it created a lot of locking issues.

I would look at your queries against this table and create the clustered index that gives the best performance and not necessarily in accordance with best practices on paper.

Upvotes: 0

Need advice in choosing right columns for clustered index in SQL Server 2012 (among list of given options)

Answers (2)

Related Questions