scherand
scherand

Reputation: 2358

SQL Server Clustered Index: (Physical) Data Page Order

I am struggling understanding what a clustered index in SQL Server 2005 is. I read the MSDN article Clustered Index Structures (among other things) but I am still unsure if I understand it correctly.

The (main) question is: what happens if I insert a row (with a "low" key) into a table with a clustered index?

The above mentioned MSDN article states:

The pages in the data chain and the rows in them are ordered on the value of the clustered index key.

And Using Clustered Indexes for example states:

For example, if a record is added to the table that is close to the beginning of the sequentially ordered list, any records in the table after that record will need to shift to allow the record to be inserted.

Does this mean that if I insert a row with a very "low" key into a table that already contains a gazillion rows literally all rows are physically shifted on disk? I cannot believe that. This would take ages, no?

Or is it rather (as I suspect) that there are two scenarios depending on how "full" the first data page is.

This would then mean the "physical order" of the data is restricted to the "page level" (i.e. within a data page) but not to the pages residing on consecutive blocks on the physical hard drive. The data pages are then just linked together in the correct order.

Or formulated in an alternative way: if SQL Server needs to read the first N rows of a table that has a clustered index it can read data pages sequentially (following the links) but these pages are not (necessarily) block wise in sequence on disk (so the disk head has to move "randomly").

How close am I? :)

Upvotes: 2

Views: 3891

Answers (2)

marc_s
marc_s

Reputation: 754468

If you happen to insert a row with a "low" ID as you say, then yes - it will be placed in the vicinity of your other rows that are already there with similar ID's.

If your SQL Server page (8K chunks) is filled to the max, then a page split will occur - half the rows will remain on that page, and the other half will be moved to a new page. These two new pages will now have some capacity for new row.

That's one of the reasons why you don't want to use something as your clustering key that is very random, e.g. a GUID, which will cause rows to the inserted all over the place.

Trying to avoid page splits (which are quite expensive operations) is one of the main reasons why gurus like Kimberly Tripp heavily advocate using something that is ever increasing as your clustering key - e.g. an INT IDENTITY column. Here, a new value is always guaranteed to be larger than anything that's already in your database, so new rows are always added at the "end" of the food chain.

For more excellent background info, see Kimberly Tripps' Blog - especially her Clustering Key category!

Upvotes: 2

Daniel Renshaw
Daniel Renshaw

Reputation: 34177

How close are you? Very!

These articles may help consolidate your understanding:

http://msdn.microsoft.com/en-us/library/aa964133(SQL.90).aspx

http://www.sql-server-performance.com/articles/per/index_fragmentation_p1.aspx

Upvotes: 1

Related Questions