mert
mert

Reputation: 2002

Can I have a primary key and a separate clustered index together?

Let's assume I already have a primary key, which makes sure uniqueness. My primary key is also ordering index for the records. However, I am curious about the primary key's task in physical order of records in the disk (if there is). And the actual question is can I have a separate clustered index for these records?

Upvotes: 2

Views: 3025

Answers (2)

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52107

This is an attempt at testing the size and performance characteristics of a covering secondary index on a clustered table, as per discussion with @Catcall.

All tests were done on MS SQL Server 2008 R2 Express (inside a fairly underpowered VM).

Size

First, I crated a clustered table with a secondary index and filled it with some test data:

CREATE TABLE THE_TABLE (
    FIELD1 int,
    FIELD2 int NOT NULL,
    CONSTRAINT THE_TABLE_PK PRIMARY KEY (FIELD1)
);

CREATE INDEX THE_TABLE_IE1 ON THE_TABLE (FIELD2) INCLUDE (FIELD1);

DECLARE @COUNT int = 1;
WHILE @COUNT <= 1000000 BEGIN
    INSERT INTO THE_TABLE (FIELD1, FIELD2) VALUES (@COUNT, @COUNT);
    SET @COUNT = @COUNT + 1;
END;

EXEC sp_spaceused 'THE_TABLE';

The last line gave me the following result...

name        rows        reserved    data        index_size  unused
THE_TABLE   1000000     27856 KB    16808 KB    11008 KB    40 KB

So, the index's B-Tree (11008 KB) is actually smaller than the table's B-Tree (16808 KB).

Speed

I generated a random number within the range of the data in the table, and then used it as criteria for selecting a whole row from the table. This was repeated 10000 times and the total time measured:

DECLARE @I int = 1;
DECLARE @F1 int;
DECLARE @F2 int;
DECLARE @END_TIME DATETIME2;
DECLARE @START_TIME DATETIME2 = SYSDATETIME();

WHILE @I <= 10000 BEGIN

    SELECT @F1 = FIELD1, @F2 = FIELD2
    FROM THE_TABLE
    WHERE FIELD1 = (SELECT CEILING(RAND() * 1000000));

    SET @I = @I + 1;
END;

SET @END_TIME = SYSDATETIME();
SELECT DATEDIFF(millisecond, @START_TIME, @END_TIME);

The last line produces an average time (of 10 measurements) of 181.3 ms.

When I change the query condition to: WHERE FIELD2 = ..., so the secondary index is used, the average time is 195.2 ms.

Execution plans:

enter image description here

enter image description here

So the performance (of selecting on the PK versus on the covering secondary index) seems to be similar. For much larger amounts of data, I suspect the secondary index could possibly be slightly faster (since it seems more compact and therefore cache-friendly), but I didn't hit that yet in my testing.

String Measurements

Using varchar(50) as type for FIELD1 and FIELD2 and inserting strings that vary in length between 22 and 28 characters gave similar results.

The sizes were:

name        rows        reserved    data        index_size  unused
THE_TABLE   1000000     208144 KB   112424 KB   95632 KB    88 KB

And the average timings were: 254.7 ms for searching on FIELD1 and 296.9 ms fir FIELD2.

Conclusion

If a clustered table has a covering secondary index, that index will have space and time characteristics similar to the table itself (possibly slightly slower, but not by much). If effect, you'll have two B-Trees that sort their data differently, but are otherwise very similar, achieving your goal of having a "second cluster".

Upvotes: 3

It depends on your dbms. Not all of them implement clustered indexes. Those that do are liable to implement them in different ways. As far as I know, every platform that implements clustered indexes also provides ways to choose which columns are in the clustered index, although often the primary key is the default.

In SQL Server, you can create a nonclustered primary key and a separate clustered index like this.

create table test (
  test_id integer primary key nonclustered,
  another_column char(5) not null unique clustered
  );

I think that the closest thing to this in Oracle is an index organized table. I could be wrong. It's not quite the same as creating a table with a clustered index in SQL Server.

You can't have multiple clustered indexes on a single table in SQL Server. A table's rows can only be stored in one order at a time. Actually, I suppose you could store rows in multiple, distinct orders, but you'd have to essentially duplicate all or part of the table for each order. (Although I didn't know it at the time I wrote this answer, DB2 UDB supports multiple clustered indexes, and it's quite an old feature. Its design and implementation is quite different from SQL Server.)

A primary key's job is to guarantee uniqueness. Although that job is often done by creating a unique index on the primary key column(s), strictly speaking uniqueness and indexing are two different things with two different aims. Uniqueness aims for data integrity; indexing aims for speed.

A primary key declaration isn't intended to give you any information about the order of rows on disk. In practice, it usually gives you some information about the order of index entries on disk. (Because primary keys are usually implemented using a unique index.)

If you SELECT rows from a table that has a clustered index, you still can't be assured that the rows will be returned to the user in the same order that they're stored on disk. Loosely speaking, the clustered index helps the query optimizer find rows faster, but it doesn't control the order in which those rows are returned to the user. The only way to guarantee the order in which rows are returned to the user is with an explicit ORDER BY clause. (This seems to be a fairly frequent point of confusion. A lot of people seem surprised when a bare SELECT on a clustered index doesn't return rows in the order they expect.)

Upvotes: 3

Related Questions