Reputation: 8734
Is it better to bulk load N batches of 1 MB data (high freq) or 1 batch of X MB data (low freq)?
The problem for me is that parsing and processing the data also takes time, so it seems that parsing, processing and persisting a gigantic dataset in parallel is not the best approach because it results in high-frequency bulk inserts.
Rather, parsing & processing should accumulate into a large batch of X size and then dispatch a (parallelised) bulk insert of that batch?
Is this correct? If so, what is a recommended size of X ?
Upvotes: 1
Views: 556
Reputation: 32693
The optimal size of the batch depends on your hardware, what processing you are doing, the amount of existing data. Only you can tell.
A smart algorithm would try to insert few batches of size N
and measure the performance, then few batches of size 2*N
, then few batches of size 4*N
, etc. until the performance starts to degrade and automatically settle on the optimal batch size.
As the database grows the optimal size of the batch would change as well, so the algorithm should adjust itself with time.
If it is a one-off task, do few tests with various batch sizes manually.
Upvotes: 1