Reputation: 101
Using the code
Using bcp As New SqlBulkCopy(destConnection)
bcp.DestinationTableName = "myOutputTable"
bcp.BatchSize = 10000
bcp.WriteToServer(reader)
End Using
Where reader is an essentially an IDataReader that reads in a table, containing 200k rows or so.
The input table looks like this
CREATE TABLE [dbo].[MyTable](
[TagIndex] [SMALLINT] NOT NULL,
[TimeStamp] [DATETIME] NOT NULL,
[RawQuality] [SMALLINT] NOT NULL,
[ValQuality] [SMALLINT] NOT NULL,
[Sigma] [REAL] NULL,
[Corrected] [REAL] NULL,
[Raw] [REAL] NULL,
[Delta] [REAL] NULL,
[Mean] [REAL] NULL,
[ScadaTimestamp] [DATETIME] NOT NULL
) ON [PRIMARY
And is ordered by TimeStamp.
The output table is of the same structure and has the following Index (and is empty at the beginning of the process).
CREATE CLUSTERED INDEX [MyOutputTable_Index] ON [dbo].[MyOutputTable]
(
[TimeStamp] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
Artificially throttling the process to run small(ish) amounts of data into the output table at a time (around < 35k) will result in fragmentation of < 5% and everything is fine.
But if I run in a larger chunk, say 45k, (or let the process run the whole 200k) the fragmentation becomes 99%+.
To be precise if I run in 39,773 I get < 5% fragmentation, but I run in 39,774 I get 99% fragmentation. And if I investigate the pages allocating in the Index I see the following using DBCC PAGE.
FileId PageId Row Level ChildFileId ChildPageId TimeStamp (key)
1 18937 0 1 1 18906 2015-10-22 01:37:32.497
1 18937 1 1 1 18686 2015-10-22 01:38:12.497
1 18937 2 1 1 18907 2015-10-22 01:38:47.497
1 18937 3 1 1 18687 2015-10-22 01:39:27.497
1 18937 4 1 1 18908 2015-10-22 01:40:02.497
1 18937 5 1 1 18688 2015-10-22 01:40:42.497
1 18937 6 1 1 18909 2015-10-22 01:41:17.497
1 18937 7 1 1 18689 2015-10-22 01:41:57.497
1 18937 8 1 1 18910 2015-10-22 01:42:32.497
Looking at the ChildPageId column we can see that the numbers do not run consecutively.
For example, 18906 is followed by 18686 which is followed by 18907 with the series beginning 18686 interleaved with the series beginning with 18906 causing fragmentation of over 99%.
So, the question is what is causing the index to be built like this when running in larger chunks of data?
Upvotes: 4
Views: 303
Reputation: 101
For anyone that is interested in the outcome of this problem.
I found that the fragmentation was being caused by a variable number of duplicates within the table.
To explain, as the Index in NON-UNIQUE it is possible, and is indeed the case that the TimeStamps are duplicated. But if the groups were all of the same size, that is there were, for example, 7 rows per timestamp consistently through the file then the fragmentation would be < 1% on completion. But, if the table was more variable then fragmentation would ramp up rapidly. For example if there were chunks of data where there were 7 rows per timestamp, then some with 6, some with 5 etc fragmentation would be over 99% with this interleaving that we see (as described in the question).
Upvotes: 2
Reputation: 3993
Its hard to tell without more data but I bet that your timestamp being a clustered index is causing this. Try sorting the data by this field before sending to the output table.
Upvotes: 2