Paul
Paul

Reputation: 101

Why do I get high fragmentation when using SqlBulkCopy to move large amounts data between databases?

Using the code

Using bcp As New SqlBulkCopy(destConnection)
     bcp.DestinationTableName = "myOutputTable"
     bcp.BatchSize = 10000

     bcp.WriteToServer(reader)
End Using

Where reader is an essentially an IDataReader that reads in a table, containing 200k rows or so.

The input table looks like this

CREATE TABLE [dbo].[MyTable](
    [TagIndex] [SMALLINT] NOT NULL,
    [TimeStamp] [DATETIME] NOT NULL,
    [RawQuality] [SMALLINT] NOT NULL,
    [ValQuality] [SMALLINT] NOT NULL,
    [Sigma] [REAL] NULL,
    [Corrected] [REAL] NULL,
    [Raw] [REAL] NULL,
    [Delta] [REAL] NULL,
    [Mean] [REAL] NULL,
    [ScadaTimestamp] [DATETIME] NOT NULL
) ON [PRIMARY

And is ordered by TimeStamp.

The output table is of the same structure and has the following Index (and is empty at the beginning of the process).

CREATE CLUSTERED INDEX [MyOutputTable_Index] ON [dbo].[MyOutputTable]
(
    [TimeStamp] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

Artificially throttling the process to run small(ish) amounts of data into the output table at a time (around < 35k) will result in fragmentation of < 5% and everything is fine.

But if I run in a larger chunk, say 45k, (or let the process run the whole 200k) the fragmentation becomes 99%+.

To be precise if I run in 39,773 I get < 5% fragmentation, but I run in 39,774 I get 99% fragmentation. And if I investigate the pages allocating in the Index I see the following using DBCC PAGE.

FileId  PageId  Row Level   ChildFileId ChildPageId TimeStamp (key)
1       18937   0   1       1           18906       2015-10-22 01:37:32.497
1       18937   1   1       1           18686       2015-10-22 01:38:12.497
1       18937   2   1       1           18907       2015-10-22 01:38:47.497
1       18937   3   1       1           18687       2015-10-22 01:39:27.497
1       18937   4   1       1           18908       2015-10-22 01:40:02.497
1       18937   5   1       1           18688       2015-10-22 01:40:42.497
1       18937   6   1       1           18909       2015-10-22 01:41:17.497
1       18937   7   1       1           18689       2015-10-22 01:41:57.497
1       18937   8   1       1           18910       2015-10-22 01:42:32.497

Looking at the ChildPageId column we can see that the numbers do not run consecutively.

For example, 18906 is followed by 18686 which is followed by 18907 with the series beginning 18686 interleaved with the series beginning with 18906 causing fragmentation of over 99%.

So, the question is what is causing the index to be built like this when running in larger chunks of data?

Upvotes: 4

Views: 303

Answers (2)

Paul
Paul

Reputation: 101

For anyone that is interested in the outcome of this problem.

I found that the fragmentation was being caused by a variable number of duplicates within the table.

To explain, as the Index in NON-UNIQUE it is possible, and is indeed the case that the TimeStamps are duplicated. But if the groups were all of the same size, that is there were, for example, 7 rows per timestamp consistently through the file then the fragmentation would be < 1% on completion. But, if the table was more variable then fragmentation would ramp up rapidly. For example if there were chunks of data where there were 7 rows per timestamp, then some with 6, some with 5 etc fragmentation would be over 99% with this interleaving that we see (as described in the question).

Upvotes: 2

Joe C
Joe C

Reputation: 3993

Its hard to tell without more data but I bet that your timestamp being a clustered index is causing this. Try sorting the data by this field before sending to the output table.

Upvotes: 2

Related Questions