Do non-clustered (non-PK) indexes need to include clustered (PK) columns?

Question

For example, here are 2 indexes on FOO table:

ALTER TABLE [dbo].[FOO] 
    ADD CONSTRAINT [PK_FOO] PRIMARY KEY CLUSTERED ([id] ASC)

CREATE NONCLUSTERED INDEX [IX_FOO] 
    ON [dbo].[FOO] ([id] ASC, [a] ASC, [b] ASC)

Many queries are using IX_FOO when filtering with a column. And it seems id column in IX_FOO is redundant since PK_FOO is indexing it. So I'm thinking to remove id column from IX_FOO like this:

CREATE NONCLUSTERED INDEX [IX_FOO2] 
    ON [dbo].[FOO] ([a] ASC, [b] ASC) 
    INCLUDE ([id])

But I'm not sure myself. Do indexes need to include PK columns?

Vladimir Baranov · Accepted Answer

The order of columns in index is important.

Index on (ID, a, b) is very different to index on (a, b) include (id).

The first index on (ID, a, b) can be used when searching by:

ID, a, b: WHERE ID = 1 AND a = 2 AND b = 3
ID, a, range of b: WHERE ID = 1 AND a = 2 AND b > 3 AND b < 5
ID, a: WHERE ID = 1 AND a = 2
ID, range of a: WHERE ID = 1 AND a > 2 AND a < 5
ID: WHERE ID = 1
range of ID: WHERE ID > 1 AND ID < 5

This index will not be used when searching just for a or just for b: WHERE a = 2; WHERE b = 3.

Selected columns can be any combination of ID, a, b - if there are more columns, index would be not enough and engine would have to read them from the table.

The second index on (a, b) include (id) can be used when searching by:

a, b: WHERE a = 2 AND b = 3
a, range of b: WHERE a = 2 AND b > 3 AND b < 5
a: WHERE a = 2
range of a: WHERE a > 2 AND a < 5

This index will not be used when searching just for ID or just for b: WHERE ID = 1; WHERE b = 3.

Selected columns also can be any combination of ID, a, b - if there are more columns, index would be not enough and engine would have to read them from the table.

If there is a clustered index on ID, then there is no point adding ID as INCLUDE to non-clustered index, because the clustered column is included in each non-clustered index implicitly. This is why it is generally recommended to have narrow clustered index, usually 4-byte int. The wider the clustered index, the more space is needed for each non-clustered index.

So, index on (a, b) include (id) is the same as (a, b) in your case.

I'm not sure if engine is smart enough to not waste disk space if you INCLUDE ID explicitly. It is easy to check.

Do non-clustered (non-PK) indexes need to include clustered (PK) columns?

Answers (2)

Related Questions