fdkgfosfskjdlsjdlkfsf
fdkgfosfskjdlsjdlkfsf

Reputation: 3301

If clustered index is table data, how can it not be unique?

I was seaching how to move a table from one filegroup to the other, and I had some doubts as to why most of the replies I found dealt with clustered indexes, considering that my question had to do with tables.

Then I looked at How I can move table to another filegroup?, and it says that the clustered index is the table data, which explains the reasoning behind recreating a clustered index with CREATE CLUSTERED INDEX.

But in that same question it says that if my clustered index is unique, then do something else.

My question: I assume that when I create tables on a database, a clustered index is created for that table. So how can it not be unique?

Thanks.

Upvotes: 2

Views: 1600

Answers (4)

usr
usr

Reputation: 171246

If you have an int array and you store the number 1 twice in it - how can that array not be unique?! (Trick question to get you thinking. It clearly can be not unique.) Being unique is a constraint on the data. Fundamentally, there is nothing preventing you from creating multiple rows that have the same values in all columns.

In a heap this is not a problem physically at all. The internal row identifier is it's location on disk.

In a b-tree based index (a "clustered index") the physical data structure indeed requires uniqueness. Note, that the logical structure (the table) does not. This is a physical concern. It's an implementation detail. SQL Server does this by internally appending a key column that contains a sequence number that is counted upwards. This disambiguates the records. You can observe this effect by creating more than 2^32 rows with the same non-unique key. You will receive an error.

So there's a hidden column in the table that you cannot access. It's officially called "uniqueifier". Internally, it's used to complete the CI key to make it unique. It's stored and used everywhere where normally the unique CI key would be used: In the CI, in non-unique NCIs, in the lock hash and in query plans.

Upvotes: 3

clifton_h
clifton_h

Reputation: 1298

**usr* has a good post worth reading. I will add here from Microsofts Documentation.

First, you are not alone with Clustered-Indexes. Honestly, the name itself is somewhat confusing (Structured-Indexes or Disk-Indexes would probably be better in SQL).

Refer back to the official documentation from MSDN. Any alterations by me are in italics:

A Clustered Index is an on-disk structure of the table. This means the values are pointing to a physical location. This is why when you move the table you need to recreate the Index because the physical location has been altered.

Clustered

  • Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.

  • The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.

Nonclustered

  • Nonclustered indexes have a structure separate from the data rows (like pointers, this is a logical ordering of the data that consumes a fraction of the physical disk space).

  • A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value.

  • The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table (think ordered).

  • For a heap, a row locator is a pointer to the row.
  • For a clustered table, the row locator is the clustered index key.

ABSTRACT VIEW:

  1. A table created is not necessarily a clustered (ordered) table.
  2. An index does not necessarily have to be unique. It is an abstract view of the table.
    • Unique means that a value or set of values will not repeat themselves. If you wish to enforce this, you can add a constraint by the index (i.e. UNIQUE CLUSTERED INDEX) or a CONSTRAINT such as PRIMARY KEY if you wish this to be managed in the table structure itself.
  3. You may have multiple unique indexes since as long as the values are represented logically, they will not share the same value as another row pointer.

Consider you have Columns A, B, and C in a given table.

Column A was created with a UNIQUE CLUSTERED INDEX. This means that either A already had an enforceable UNIQUE constraint (like PK, UNIQUE CONSTRAINT) or was DECLARED EXPLICITLY.

A Column Group {B,C} could be a unique index so long as B and C never repeat itself together. In the same way, you could theoretically have indexes with the groups {A}, {B,C}, {A,C}, and every one of them be unique. Recall that an index is a logical ordering of the data so they likely will not have the same logical value (and thus are unique).

HOWEVER: unless the datatype, constraint (including the INDEX constraint), or table structure enforces a unique constraint on a COLUMN, you should not assume the index is unique. Furthermore, you cannot create a UNIQUE index if there are more than one rows containing the same combination of NULL values since SQL Server will treat them as the same value (NULL being unknown).

Will SQL Server use your indexes, unique or not? Well that is another story and depends on a number of things. But hopefully you find this post helpful.

Sources: MSDN - Clustered and Nonclustered Indexes Described

Upvotes: 1

Kannan Kandasamy
Kannan Kandasamy

Reputation: 13969

If Clustered Index is not unique then SQL Server internally creates Uniquifier to make uniqueness on that record. I will try to explain with an example:

CREATE TABLE Test2 (Col1 INT, Col2 INT)

CREATE CLUSTERED INDEX idxClustered ON Test2 (Col1) 
CREATE NONCLUSTERED INDEX idxNonClustered ON test2 (Col2) 

Here cluserered index is not unique

INSERT INTO Test2 VALUES (1,1), (2,2) 
INSERT INTO Test2 VALUES (3,3)
INSERT INTO Test2 VALUES (3,3)

--Get the Page Number of the Non Clustered Index
DBCC IND (Test, Test2, -1)

--Examine the Results of the Page
--Not to run in production
DBCC TRACEON (3604); 
DBCC PAGE(Test, 1, 3376, 3); 

You will see Uniquifier key with corresponding uniqueness value... If your clustered index is Unique Clustered Index then It will not have that Uniquifier attribute.

Upvotes: 1

SlimsGhost
SlimsGhost

Reputation: 2909

A clustered index doesn't have to be unique. But, there can be only one clustered index on a table, because a clustered index actually determines the physical order of the table rows on disk (but I find it confusing to say that the clustered index is the table data, per se, even though they are strongly tied to each other).

HERE is a good post all about non-unique clustered indexes. Even if the index was the entire row of data, you can certainly have duplicate rows (no PK), which would equate to duplicate clustered index nodes.

Upvotes: 0

Related Questions