Dave New
Dave New

Reputation: 40092

A single non-clustered index on two columns or a separate index on each column?

Let's say I have the following table:

Table: RelationshipType
============================================================
|  ID (PK)  |  ParentID  |  ChildID  |  RelationshipType   |
============================================================

There are mostly cases where ParentID and ChildID are selected on individually:

... WHERE ParentID = @SomeID

... WHERE ChildID  = @SomeID

And sometimes both are selected on:

... WHERE ParentID = @SomeID AND ChildID  = @SomeOtherID

I want to increase the performance of these queries but most notably the first two. Should I create a non-clustered index on ParentID + ChildID together or one index on ParentID and another index on ChildID?

EDIT: All of these queries are highly selective (1 or 2 records returned).

Upvotes: 2

Views: 3185

Answers (4)

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52157

Can you get rid of the surrogate key ID?

If yes, consider creating the following:

  • A primary and clustering key on {ParentID, ChildID}.
  • A secondary index on {ChildID, ParentID}, but include the RelationshipType in the index as well (use the INCLUDE keyword).

This way, you have a covering index in all 3 cases, so you don't have to pay the price of the double-lookup (that is normally required for secondary indexes in clustered tables):

  • ... WHERE ParentID = @SomeID can be satisfied by a simple seek in the B-Tree of the index: {ParentID, ChildID}. The value of ChildID and RelationshipType1 can be retrieved directly from the found leaf of this B-Tree.
  • ... WHERE ChildID = @SomeID can be satisfied by a simple seek in the B-Tree of the index: {ChildID, ParentID}. The value of ParentID and RelationshipType2 can be retrieved directly from the found leaf of this B-Tree.
  • ... WHERE ParentID = @SomeID AND ChildID = @SomeOtherID can be satisfied by either.

1 The clustering key is the "main" B-Tree for the table and includes all columns, not just those that are unique.

2 Thanks to INCLUDE (RelationshipType).


Doing something similar with the ID present is possible, but would require 3 indexes instead of 2 and all of them would be fatter to achieve covering. You'd have to measure to make sure, but my feeling is that this would be more trouble than it's worth.

Otherwise, don't use clustering at all. Just create normal indexes on:

  • {ID} - regular, non-clustering primary index (use the NONCLUSTERED keyword).
  • {ParentID} - regular secondary index.
  • {ChildID} - regular secondary index.

You'll have a normal heap table, so each access will require an index seek + (usually) table heap access, but your indexes will be kept slim, raising the cache effectiveness.

... WHERE ParentID = @SomeID AND ChildID = @SomeOtherID would require two index seeks (or possibly a seek on either {ParentID} or {ChildID} index + table heap access), but this is still pretty fast and is not too frequent (as you stated).


Please do measure on realistic amounts of data before deciding either way.

Upvotes: 4

Tobsey
Tobsey

Reputation: 3400

I would create a single non-clustered index on each column but include the other column and also the RelationshipType column (I assume RelationshipType is the data you are retrieving):

CREATE NONCLUSTERED INDEX IX_RelationshipType_ParentID
ON
    RelationshipType(ParentID)
INCLUDE
    (ChildID, RelationshipType)
GO

CREATE NONCLUSTERED INDEX IX_RelationshipType_ChildID
ON
    RelationshipType(ChildID)
INCLUDE
    (ChildID, RelationshipType)
GO

This will result in the engine being able to get the desired data from the index once it has found the entry, without the need to return to the table after locating the item in the index.

Upvotes: 1

Dudeman3000
Dudeman3000

Reputation: 601

In a way, it's impossible to say what would give your database the best performance without knowing the frequency of each of these selects and how often inserts / update to the table occur, but here's my best guess:

It sounds like ParentID, ChildID might be you're primary key, which is by definition a clustered index.

The lazy way to do it here is to create two more non-clustered indexes on ParentID and ChildID. But... particularly the ParentID column, or whatever column comes first in your primary key / cluster index - I'm really not sure if you are going to get any select benefit at all if you create another non-clustered ParentID index. A non-clustered index is going to store a copy of your table ordered by that index, but in this case, the primary key dictates the ordering of your table, and it is already ordered, first, by ParentID.

So in conclusion, I would make a primary key of ParentID and ChildID and a non-clustered index on ChildID and I think you're good to go.

Upvotes: 1

Martin Smith
Martin Smith

Reputation: 453898

As you say looking up by either ParentId or ChildId is highly selective I'd just go for the two separate indexes.

SQL Server can then use either index for the WHERE ParentID = @SomeID AND ChildID = @SomeOtherID and evaluate the residual predicate on the one or two rows that match.

I suppose an exception would be if the table is more or less read only and the entire database fits in memory then there is no down side of having the additional index and it avoids a lookup to retrieve the missing column.

Upvotes: 1

Related Questions