Reputation: 555
Does page splitting affect the physical sort of data in an index or the table in general? EXAMPLE: If you had a indexed email column and page splitting was occurring a lot, would the email addressed become out of order on the list, and now longer be in order?
How does page splitting/fragmentation work with a primary key record being deleted? EXAMPLE: If you have the following table with the primary keys being the numbers.
1, bob, chair
2, joe, table
3, brandon, lamp
4, jared, tv
Lets say you delete record 3. Now the table reads 1, 2, 4 (out of order), and then record 3 is re-inserted at a later time? It will then read 1,2,4,3. Does record 4 takes record 3's space or is it reserved in case record 3 is re-inserted at a later time? Or do you have to perform an index rebuild to physically re-sort it?
Thank you
Upvotes: 4
Views: 2011
Reputation: 8324
Let's take a look at your example and see what happens.
First, dummy table and dummy data.
CREATE TABLE testing (ID INT PRIMARY KEY, junk VARCHAR(100))
GO
INSERT INTO testing(ID, junk)
VALUES
(1, 'blah'),
(2, 'blahhhh'),
(3, 'Blabbb')
GO
Now, lets take a look at the actual data page this is stored on.
SELECT
*,
sys.fn_PhysLocFormatter(%%physloc%%) AS Page_Location
from testing
Take the output of that function and put it into DBCC PAGE
DBCC TRACEON(3604) --Send output of DBCC PAGE to SSMS Client
GO
DBCC PAGE('your_db_name',the_first_num_from_page_location,the_second_num_from_page_location,3)
This will show you what the data page actually has on it. 'Slots' are rows in your table and the page starts at Slot 0.
Slot 0 Offset 0x60 Length 19
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 19
Memory Dump @0x0000004493BFA060
0000000000000000: 30000800 01000000 02000001 00130062 6c6168 0..............blah
Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
ID = 1
Slot 0 Column 2 Offset 0xf Length 4 Length (physical) 4
junk = blah
Slot 0 Offset 0x0 Length 0 Length (physical) 0
KeyHashValue = (8194443284a0)
Keep reading the page and you will see your 3 rows of data. Now, delete record 2 and rerun the page_location query and relook at the page. Slot 1 should now be your ID=3 record.
DELETE FROM testing
WHERE ID = 2
Now, rebuild your PK index and reinsert record 2.
ALTER INDEX your_pk here ON testing
REBUILD
INSERT INTO testing(ID, Junk)
VALUES
(2,'blerg')
GO
Rerun the page_location query since the pages probably changed after the rebuild and look at your slots. Even though you inserted 2 after 3, slot 1 (row 2) is your ID =2 record.
However, there is something called the Slot Array on the data page that stored the logical order of the data. So even though the data on the page might be physically stored totally out of order, the slot array tell sql server how it should be logically sorted based on the index.
You can see this in action here:
DBCC TRACEON(3604) --Send output of DBCC PAGE to SSMS Client
GO
DBCC PAGE('your_db_name',the_first_num_from_page_location,the_second_num_from_page_location,2)
At the very bottom, you will see the offset table: OFFSET TABLE:
Row - Offset
2 (0x2) - 115 (0x73)
1 (0x1) - 136 (0x88)
0 (0x0) - 96 (0x60)
Rebuild your PK again and see how it changes:
OFFSET TABLE:
Row - Offset
2 (0x2) - 135 (0x87)
1 (0x1) - 115 (0x73)
0 (0x0) - 96 (0x60)
So, the data on the page might be totally out of order physically, but the slot array tells the engine which order it should be in based on your index definition. Your results in SSMS or on the data page might look to be in physically sorted order, but in reality it is probably totally different and the slot array is keeping track of what order it should be in. Reinserting PK rows out of order could definitely cause page splits if it has to push data around.
Upvotes: 0
Reputation: 27294
1.) The clustered index is a logical order not physical, avoid trying to think of it as a real physical order, there are too many levels of abstraction between the index and the storage to consider physical order. (Extent allocation, Filegroups, Fragmentation on a disk / Lun, Lun's themselves - all these things make the notion of a physical order incorrect.)
Page splitting is handled in a way in which the logical order is maintained, the forward / reverse double linked list linking pages for a table are updated to account for the new page inserting it in 'order' so to speak.
2.) Physical order vs logical - again, just stop thinking about physical ordering, even if the rows are on the same page, the slot array of the page is the one maintaining the actual order of the data on the page itself. So while on the page it might end up 1,2,4,3, the slot array will be 4,3,2,1 (Slot array starts at the end of the page and is read backwards) - whether the space is re-used or not doesn't really matter, since it will handle the offset positions on the page vs order to be maintained via that array.
Impaler mentions using an auto-increment value for the primary key - the primary key and the clustered key do not have to be the same thing, while I would agree that you often use an identity field for a clustered key - the primary key should be considered separately based on needs and the candidate keys available.
How SQL Server works at a storage level can get quite a complex topic, personally I would recommend Kalen Delaney's book, although others are available.
Upvotes: 3
Reputation: 48850
SQL Server uses clustered index tables (by default). These tables is sorted by their primary key.
When you insert an intermediate value the pages are rearranged to allocate space for the new row. As far as I understand, when you remove a row the space is reclaimed afterwards (at some point). These operations are expensive to perform.
This is why is usually recommended to use a auto-incrementing value for the primary key, that avoid the problem you are mentioning.
Upvotes: 0