iDevlop
iDevlop

Reputation: 25262

Primary key & clustered index

Say I have
- Clients table, PK is ClientId
- Products table, PK is ProductId
I need to store, for a few Clients, their internal Product Reference, so I create a Client-Product table:

CREATE TABLE [dbo].[Product-Client](
    [IdProduct] [varchar](15) NOT NULL,
    [IdClient] [varchar](10) NOT NULL,
    [RefClient] [varchar](20) NOT NULL,       --client's internal product Id
 CONSTRAINT [aaaaaArticles-Clients_PK] PRIMARY KEY CLUSTERED   -- sure ???
(
    [IdClient] ASC,
    [IdProduct] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

Of course I will also add 2 FOREIGN KEY constraints to make sure that the ProductId exists and the Client exists.
I want a unique index on ProductId + ClientId.
I want an unique index on ClientId + RefClient.
Knowing that these Client References are quite static, so they will be rarely updated, but often read, my questions are:

  1. what is the ideal PK ?
  2. which index should be the CLUSTERED one ?

Edit
For question 1 there are 3 possible answers of course:
a) ClientId+ProductId (a unique index will have to be created anyway)
b) ClientId+RefClient (a unique index will have to be created anyway)
c) a surrogate key

Upvotes: 2

Views: 2941

Answers (3)

user1429080
user1429080

Reputation: 9166

The answer for question 1., in my humble opinion, is that the primary key should be on a combination of ClientId and ProductId. This indicates to a human that the table contains data that is relevant for this combination, while the RefClient column contains the the data.

Which one of them to put first in the key can to some extent depend of the use case. Semantically, it makes sense to have ClientId first, since (again, in my opinion) the table contains data that is primarily associated with the client. But from a micro-optimizing, squeeze every nano-second worth of performance out of it perspective, the answer could depend on which column will have more variation in the data. If there will be a few different ClientId values, but many ProductId values for each client value, there might be some small gain from putting the ProductId first.

Regarding question 2., the answer is that it depends on use case, just like @swe wrote.

If we consider the table to be almost static, then my guess would be that a clustered index on the primary key ClientId + ProductId would be ideal.

This is based on the assumption that one client searches for many consecutive products is a more common use case than we want all clients with a specific id for this product. This would then mean that several rows of data from the table could be found on the same datapage, thereby reducing IO (reading from disc).

All in all though, I think semantics should be what guides your decision. Trying to squeeze the ultimate amount of performance out of this seems very much like premature optimization. So with that said, I suggest you use ClientId + ProductId as both primary key and clustered index.

Upvotes: 1

Roger Wolf
Roger Wolf

Reputation: 7692

There are many books written on the topic of how to choose best indices. In short, it depends on how you will access the data.

There are also well-defined criteria on choosing the most suitable clustered index. If you can predict types of queries that will access the table, you can choose one and arrange its columns in the most efficient order.

Starting from SQL Server 2005, there is a built-in index suggesting functionality, which you can use to improve your indices. Look up this reference, for a start. Also, you can find plenty of ready-made scripts on the internet that utilise this feature. However, as with all automated recommendations, it should not be followed blindly - you have to understand the benefits and drawbacks of every index created.

In short, nobody here can predict indices optimal for your particular type of workload. However, missing index suggestions backed by the actual query statistics collected over a significant period of time can be the next best thing.

Upvotes: 0

swe
swe

Reputation: 1455

The primary key has to be unique. It has nothing to do with the storage on disk, BUT is used by SQL-Server-Management-Studio as Default Clustered Index. The ideal PK is the combination of the smallest amount of values beeing unique in every situation.

The clustered index should be build on your most common query-where-clause.

BUT there are hundered of other points to think about, if you REALLY want the best answer, you have to give few more details, including but not limited to:

How often is your table beeing written in parallel, how often do values change, how often do you send which query...

Upvotes: 0

Related Questions