Herman Schoenfeld
Herman Schoenfeld

Reputation: 8734

Best strategy for gigantic SQL Server bulk Insert - high-frequency or low-frequency

Is it better to bulk load N batches of 1 MB data (high freq) or 1 batch of X MB data (low freq)?

The problem for me is that parsing and processing the data also takes time, so it seems that parsing, processing and persisting a gigantic dataset in parallel is not the best approach because it results in high-frequency bulk inserts.

Rather, parsing & processing should accumulate into a large batch of X size and then dispatch a (parallelised) bulk insert of that batch?

Is this correct? If so, what is a recommended size of X ?

Upvotes: 1

Views: 556

Answers (1)

Vladimir Baranov
Vladimir Baranov

Reputation: 32693

The optimal size of the batch depends on your hardware, what processing you are doing, the amount of existing data. Only you can tell.

A smart algorithm would try to insert few batches of size N and measure the performance, then few batches of size 2*N, then few batches of size 4*N, etc. until the performance starts to degrade and automatically settle on the optimal batch size.

As the database grows the optimal size of the batch would change as well, so the algorithm should adjust itself with time.

If it is a one-off task, do few tests with various batch sizes manually.

Upvotes: 1

Related Questions