Phil Scholtes
Phil Scholtes

Reputation: 419

SSIS storing logging variables in a derived column

I am developing SSIS packages that consist of 2 main steps:

Step 1: Grab all sorts of data from existing legacy systems and dump them into a series of staging tables in my database.

Step 2: Move the data from my staging tables into a more relational set of tables that I'm using specifically for my project.


In step 1 I'm just doing a bulk SELECT and a bulk INSERT; however, in step 2 I'm doing row-by-row inserts into my tables using OLEDB Command tasks so that I can log very specific row-level activity of everything that's happening. Here is my general layout for step 2 processes.

alt text http://dl.dropbox.com/u/2468578/screenshots/step_1.png

You'll notice 3 OLEDB tasks: 1 for the actual INSERT, and 2 for success/fail INSERTs into our logging table.

The main thing I'm logging is source table/id and destination table/id for each row that passes through this flow. I'm storing this stuff in variables and adding them to the data flow using a Derived Column so that I can easily map them to the query parameters of the stored procedures.

alt text http://dl.dropbox.com/u/2468578/screenshots/step_3.png

I've decided to store these logging values in variables instead of hard-coding the values in the SqlCommand field on the task, because I'm pretty sure you CAN'T put variable expressions in that field (i.e. exec storedproc @[User::VariableName],... ,... ,...). So, this is the best solution I've found.

alt text http://dl.dropbox.com/u/2468578/screenshots/step_2.png

Is this the best solution? Probably not.

Is it good performance wise to add 4 logging columns to a data flow that consists of 500,000 records? Probably not.

Can you think of a better way?

Upvotes: 0

Views: 723

Answers (2)

Jason Horner
Jason Horner

Reputation: 3690

to add to Cade's answer if you truly need the logging info on a row by row basis, your best best is to leverage the oledb destination and use one or both of the following transformations to add columns to the dataflow:

Derived Column Transformation

Audit Transformation

This should be your best bet and should't add much overhead

Upvotes: 0

Cade Roux
Cade Roux

Reputation: 89741

I really don't think calling an OLEDBCommand 500,000 times is going to be performant.

If you are already going to staging tables - load it all to a staging table and take it from there in T-SQL or even another dataflow (or to a raw file and then something else depending on your complete operation). A Bulk insert is going to be hugely more efficient.

Upvotes: 1

Related Questions