NedStarkOfWinterfell
NedStarkOfWinterfell

Reputation: 5203

Original insertion order with Clustered Index

I have a question regarding clustered index.

In clustered indices, the leaf level nodes themselves hold the data in sorted order, right?

That is, with each and every insertion/updation/deletion, the nodes are reshuffled to maintain the sorted order.

So how can the data be retrieved from it in the order they were inserted?

Imagine the following data is inserted in the order given: 1,7,4,5,2 and a clustered index is created on this field.

So the data will be stored internally in the order 1,2,4,5,7 right?

So that might facilitate faster lookup for a particular value, but what if the user wants the first 3 values in the order he inserted?

Are they somehow retrievable, or do I have to assign an incremental id for each row inserted, declare a non-clustered index on that, and provide the data for the first 3 records based on sorting the records on that id-field?

Upvotes: 1

Views: 602

Answers (3)

user1293068
user1293068

Reputation:

The table data is sorted according to the order of the clustered index. You can only have ONE clustered index on each table ,if you like to check the first 3 values in the order he inserted ,

USE AdventureWorks

go

CREATE TABLE myTable99(
Col1 int IDENTITY(1,1) PRIMARY KEY , Col2 Char(1) , Col3 datetime DEFAULT getdate()

) GO

INSERT INTO myTable99(Col2) SELECT 'A' UNION ALL SELECT 'B' UNION ALL SELECT 'C' GO

SELECT * FROM myTable99 ORDER BY 3 GO

DROP TABLE myTable99 GO

Other method could be :

CREATE TABLE CounterData]( [CounterDataID] [bigint] IDENTITY(1,1) NOT NULL, [DateTimeID] [bigint] NOT NULL, [Value] [float] NULL ) ON [PRIMARY]

CREATE UNIQUE CLUSTERED INDEX [IX_DateTime_CounterDataID] ON [PK].[CounterData]

(

[DateTimeID] ASC,
[CounterDataID] ASC

)

WITH

(PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO

Upvotes: 1

Andrew
Andrew

Reputation: 27294

(Answers based on SQL Server - question doesn't 100% specify)

In clustered indices, the leaf level nodes themselves hold the data in sorted order, right?

That is not quite correct, the data can be stored in any order on the leaf, but the slot array on the page is in effect the order the data is read from off the page - not the physical order of the data.

That is, with each and every insertion/updation/deletion, the nodes are reshuffled to maintain the sorted order.

Nodes (e.g. pages are split and the forward / backward pointers on the double link list change), but inside a page, the slot arrays is still the entity that retains the order, rows themselves would not be shuffled to match the slot array order.

So how can the data be retrieved from it in the order they were inserted?

It's not normally guarenteed that it will be in the exact order - that tends to happen more on a heap page, where the slot array is more representative of the order, but again, not guarenteed.

Imagine the following data is inserted in the order given: 1,7,4,5,2 and a clustered index is created on this field. So the data will be stored internally in the order 1,2,4,5,7 right?

No, it will be stored 1,7,4,5,2 on the page, but the slot array would read the addresses on the page as 7,5,4,2,1 (its constructed from the end of the page backwards, so you read in reverse.)

So that might facilitate faster lookup for a particular value, but what if the user wants the first 3 values in the order he inserted?

Kind of immaterial in this case - aside from no such guarentee about ordering, SQL will read the whole page into memory. If you want to know more about SQL Internals at this kind of level, I would still recommend Kalen Delaneys SQL Internals book as one of the best sources.

If you want any information about insertion order, I suggest some kind of inserted_timestamp

Upvotes: 2

Hogan
Hogan

Reputation: 70529

Sounds like you want a timestamp on your rows to me. I typically put the following columns on all tables I create (for auditing):

timecreated
timemodified
createdby
modifiedby
deleted

These columns let you know who created the row and when, when it was last modified and by who and optionally "soft delete" the row by setting deleted to true. Of course all your other queries in the system will have to check the deleted boolean for the soft delete to work.

Upvotes: 1

Related Questions