Reputation: 25262
Say I have
- Clients table, PK is ClientId
- Products table, PK is ProductId
I need to store, for a few Clients, their internal Product Reference, so I create a Client-Product table:
CREATE TABLE [dbo].[Product-Client](
[IdProduct] [varchar](15) NOT NULL,
[IdClient] [varchar](10) NOT NULL,
[RefClient] [varchar](20) NOT NULL, --client's internal product Id
CONSTRAINT [aaaaaArticles-Clients_PK] PRIMARY KEY CLUSTERED -- sure ???
(
[IdClient] ASC,
[IdProduct] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
Of course I will also add 2 FOREIGN KEY constraints to make sure that the ProductId exists and the Client exists.
I want a unique index on ProductId + ClientId.
I want an unique index on ClientId + RefClient.
Knowing that these Client References are quite static, so they will be rarely updated, but often read, my questions are:
Edit
For question 1 there are 3 possible answers of course:
a) ClientId+ProductId (a unique index will have to be created anyway)
b) ClientId+RefClient (a unique index will have to be created anyway)
c) a surrogate key
Upvotes: 2
Views: 2941
Reputation: 9166
The answer for question 1., in my humble opinion, is that the primary key should be on a combination of ClientId
and ProductId
. This indicates to a human that the table contains data that is relevant for this combination, while the RefClient
column contains the the data.
Which one of them to put first in the key can to some extent depend of the use case. Semantically, it makes sense to have ClientId
first, since (again, in my opinion) the table contains data that is primarily associated with the client. But from a micro-optimizing, squeeze every nano-second worth of performance out of it perspective, the answer could depend on which column will have more variation in the data. If there will be a few different ClientId
values, but many ProductId
values for each client value, there might be some small gain from putting the ProductId
first.
Regarding question 2., the answer is that it depends on use case, just like @swe wrote.
If we consider the table to be almost static, then my guess would be that a clustered index on the primary key ClientId + ProductId
would be ideal.
This is based on the assumption that one client searches for many consecutive products
is a more common use case than we want all clients with a specific id for this product
. This would then mean that several rows of data from the table could be found on the same datapage, thereby reducing IO (reading from disc).
All in all though, I think semantics should be what guides your decision. Trying to squeeze the ultimate amount of performance out of this seems very much like premature optimization. So with that said, I suggest you use ClientId + ProductId
as both primary key and clustered index.
Upvotes: 1
Reputation: 7692
There are many books written on the topic of how to choose best indices. In short, it depends on how you will access the data.
There are also well-defined criteria on choosing the most suitable clustered index. If you can predict types of queries that will access the table, you can choose one and arrange its columns in the most efficient order.
Starting from SQL Server 2005, there is a built-in index suggesting functionality, which you can use to improve your indices. Look up this reference, for a start. Also, you can find plenty of ready-made scripts on the internet that utilise this feature. However, as with all automated recommendations, it should not be followed blindly - you have to understand the benefits and drawbacks of every index created.
In short, nobody here can predict indices optimal for your particular type of workload. However, missing index suggestions backed by the actual query statistics collected over a significant period of time can be the next best thing.
Upvotes: 0
Reputation: 1455
The primary key has to be unique. It has nothing to do with the storage on disk, BUT is used by SQL-Server-Management-Studio as Default Clustered Index. The ideal PK is the combination of the smallest amount of values beeing unique in every situation.
The clustered index should be build on your most common query-where-clause.
BUT there are hundered of other points to think about, if you REALLY want the best answer, you have to give few more details, including but not limited to:
How often is your table beeing written in parallel, how often do values change, how often do you send which query...
Upvotes: 0