mud_punk
mud_punk

Reputation: 43

What is the "Buffer Size" & "Max Rows" Relationship to Multiple Source and Destination flows "Rows Per Batch" and "Insert Commit Size"?

I have a Data Flow Task that moves a bunch of data from multiple sources to multiple destinations. About 50 in all. The data moved is from one database to another with varying rows and columns in each flow.

While I believe I understand the basic idea behind the Data Flow Task's DefaultBufferMaxRows and DefaultBufferSize as it relates to Rows per Batch and Maximum insert commit size of the Destination, it's not clear to me what happens when there are multiple unrelated source and destination flows.

What I'm wondering is which of the following makes the most sense :

I believe I read some place that it's better to have each source and destination in it's own data flow task, but I am unable to find the link at this time.

Most examples I've been able to locate online seem to always be for one source to one or more destinations, or just one to one.

Upvotes: 4

Views: 1719

Answers (1)

Ferdipux
Ferdipux

Reputation: 5246

Let me go from the basis. Data Flow Task is a task, organizing a pipeline of data from Data Source to Data Destination. It is a unique task in SSIS because it runs data manipulation in SSIS itself, all other tasks call external systems to do something with data out of SSIS.
On the relationships between DefaultBufferMaxRows, DefaultBufferSize as it relates to Rows per Batch and Maximum insert commit size of the Destination. There is no direct relation. DefaultBufferMaxRows and DefaultBufferSize are properties of Data Flow pipeline; the pipeline processes rows in batches and these properties controls the processing batch size. These properties control RAM consumption and performance of Data Flow Task.
On other hand, Rows per Batch and Maximum insert commit size are the properties of Data Destination, namely OLE DB Destination in Fast Load mode only; it controls performance of Data Destination itself. You may have a Data Flow with Flat File Destination where you do not have Rows per Batch, but it will definitely have DefaultBufferMaxRows and DefaultBufferSize properties.

Typical usage from my experience:

  • DefaultBufferMaxRows and DefaultBufferSize control batch size of Data Flow pipeline. Tuning it is a tradeoff - bigger batches means less overhead on batch handling i.e. less execution time, but more RAM consumption. More RAM means that you might experience outage of RAM and DFT data buffers will be swapped to Disk.
    In SSIS 2016+ there is a "magical setting" AutoAdjustBufferSize which tells the engine to autogrow the buffer.
    Values for these properties are usually defined at performance tests in QA environment. On development - use the defaults.
  • Rows per Batch and Maximum insert commit size -- control log growth and possibility to rollback all changes. Do not change these unless you really need to do so. Defaults are generally Ok; I changed it rarely on special reason. More on its functions.

On package design:

  • 1 pair of Source-Destination per DFT (Data Flow Task). This is optimal - gives you most of control in terms of tuning and execution order etc. Also you can utilize parallel execution of tasks by SSIS engine. BTW, it simplifies debugging and support.
  • Division in groups. You can group DFT in Sequence groups and define common properties via Expressions-Variables. But - use it if you really need to do so because it complicates your design.
  • All Source-Destination in one DFT. I would recommend against it, complex and error prone.

As a bottom line, keep it simple -- 1 pair of Source-Destination per DFT, and play with your parameters only if have to do so.

Upvotes: 6

Related Questions