Tomalak
Tomalak

Reputation: 338228

What is the optimal indexing strategy for a relation table?

A relation table is the common solution to representing a many-to-many (m:n) relationship.

In the simplest form, it combines foreign keys referencing the two relating tables to a new composite primary key:

A        AtoB     B
----     ----     ----
*id      *Aid     *id
data     *Bid     data

How should it be indexed to provide optimal performance in every JOIN situation?

  1. clustered index over (Aid ASC, Bid ASC) (this is mandatory anyway, I guess)
  2. option #1 plus an additional index over (Bid ASC, Aid ASC)
  3. or option #1 plus an additional index over (Bid ASC)
  4. any other options? Vendor-specific stuff, maybe?

Upvotes: 13

Views: 3145

Answers (3)

Quassnoi
Quassnoi

Reputation: 425391

I made some tests, and here is the update:

To cover all possible cases, you'll need to have:

CLUSTERED INDEX (a, b)
INDEX (b)

This will cover all JOIN sutiations AND ORDER BY

Note that an index on B is actually sorted on (B, A) since it references clustered rows.

As long as your a and b tables have PRIMARY KEY's on id's, you don't need to create additional indexes to handle ORDER BY ASC, DESC.

See the entry in my blog for more details:

Upvotes: 9

edosoft
edosoft

Reputation: 17271

I have done some quick and dirty tests by examining the execution plans in SQL server 2005. The plans showed that SQL uses the clustered index on Aid,Bid for most queries. Adding an index on Bid (ASC) shows that it's used for queries of type

select * from A
    inner join AtoB on Aid = A.id
    inner join B on Bid = B.id
where Bid = 1

So I'm voting for solution #3.

Upvotes: 1

Mehrdad Afshari
Mehrdad Afshari

Reputation: 421998

I guess solution 2 is optimal. I'd choose the order of the clustered index by looking at the values and expecting which one has more distinct rows. That one goes first. Also it's important to have unique or primary key indexes on parent tables.

Depending on DBMS, number 3 might work as good as number 2. It might or might not be smart enough to consider the values (key of clustered index) in the nonclustered index for anything other than refering the the actual row. If it can use it, then number 3 would be better.

Upvotes: 1

Related Questions