ilija veselica
ilija veselica

Reputation: 9574

SSIS Conditional Split - multiple conditions

I am trying to check if any of the column has changed. I have tried put all in on condition:

Taxes != (ISNULL(LookupTaxes) ? 0 : LookupTaxes) || Checksum != (ISNULL(LookupChecksum) ? 0 : LookupChecksum) || FeeIncome != (ISNULL(LookupFeeIncome) ? 0 : LookupFeeIncome) || CommissionReceived != (ISNULL(LookupCommissionReceived) ? 0 : LookupCommissionReceived) || CommissionPaid != (ISNULL(LookupCommissionPaid) ? 0 : LookupCommissionPaid) || Premium != (ISNULL(LookupPremium) ? 0 : LookupPremium)

but this is always returning FALSE although I have manually changed Taxes. If I put each condition separately than it works, for example:

Taxes != (ISNULL(LookupTaxes) ? 0 : LookupTaxes)

returns TRUE. If I use 6 conditions (instead of 1) for each column and output results into Union All is this method giving me what I need? My biggest concern is if rows will be duplicated. I have checked and looks like they are not but I wonder why some condition picks X records and another Y when I have changed both columns (related to these two conditions). For example, all Taxes and Premium columns are changed. In Conditional Split output "Taxes have changed" condition picks 1,000,000 rows, "Premium has changed" picks 100 rows. I know that this doesn't make any difference in my case because for me it's important that these rows are picked up for update but I am just confused about how this thing works.

Upvotes: 1

Views: 34644

Answers (3)

Bok Woon
Bok Woon

Reputation: 1

Actually the answer would be as follows

(Taxes != (ISNULL(LookupTaxes) ? 0 : LookupTaxes)) ||
(Checksum != (ISNULL(LookupChecksum) ? 0 : LookupChecksum)) ||
(FeeIncome != (ISNULL(LookupFeeIncome) ? 0 : LookupFeeIncome)) ||
(CommissionReceived != (ISNULL(LookupCommissionReceived) ? 0 : LookupCommissionReceived)) ||
(CommissionPaid != (ISNULL(LookupCommissionPaid) ? 0 : LookupCommissionPaid)) ||
(Premium != (ISNULL(LookupPremium) ? 0 : LookupPremium))

The extra "(" and ")" at the beginning and end of each variable checks is needed. The || (OR) condition will think that Taxes != the OR condition, as in the first part of your condition.

This will work.

Upvotes: 0

Constance Mitchell
Constance Mitchell

Reputation: 11

An alternative is to use 2 Derived Columns Data Flow Transformations (DFT) prior to a Conditional Split DFT.

Derived Column DFT 1: Check each attribute setting a value of 1 if there is a change in the data and 0 if there is no change. For example, check the value of the inbound Data of Birth Column to the Database Date of Birth Column.

   DerivedColumn1 =  ((!ISNULL(InDOB) && !ISNULL(DbDOB) && InDOB != DbDOB) || (ISNULL(DbDOB) && !ISNULL(InDOB))) ? 1 : 0

DerivedColumn1 result is a signed integer value = 1 or 0.

DerivedColumn2: Sum the Derived Column values from DFT1.

   IdentifiedChange = DerivedColumn1 + DerivedColumn2 + ....

Conditional Split DFT: Identifies if there is a change in the data as determined by the result of DFT2.

   YesChange   IdentifiedChange > 0

Hope this helps.

Upvotes: 1

billinkc
billinkc

Reputation: 61201

I believe your logic is sound but I would suggest you split all those chained conditionals out into separate derived columns, especially if you'll have 1M rows flowing through it.

The first reason is performance. By splitting out operations into smaller pieces, the data flow engine can better take advantage of parallelism. Investigation: Can different combinations of components affect Dataflow performance?. Money quote from SQL CAT on the subject

Our testing has shown that if you want to change more than one column in a data flow by using a Derived Column task, there is a performance benefit to splitting that transformation into multiple Derived Column tasks. In this note we will show that doing many smaller pieces of work in multiple Derived Column tasks is significantly faster than doing all the work in a more complex work in a single Derived Column task.

The second reason is maintainability. Not only is the expression editor unfriendly and unforgiving, it makes it incredibly challenging to inspect intermediate values.

Demo

I put together a reproduction package that uses a script task to sends N rows down a data flow with the same value in all the columns as the row number. In the first data flow, I modify the values of Checksum and Premium as I load into a cache connection manager (to simulate lookup values differing). Even numbered rows should have the Checksum nulled out and every third row should have Premium nulled.

cache connection manager load

In this data flow, I used both your original expression (All in one check) as well as split it out into a check per condition.

data flow

As you can maybe see by the data viewer attached to the "bit bucket" task, the Changed post-fixed named columns only evaluate to True when there is a difference between the source and lookup value. (The row corresponding to 0 is accurate as (ISNULL(LookupTaxes) ? 0 : LookupTaxes) forces null values to be zero.

data viewer

Were I you, at this point I'd replace the "bit bucket" transformation with a Conditional split

  • Output Name = UpdateRequired
  • Condition = [TaxesChanged] || [ChecksumChanged] || [FeeIncomeChanged]|| [CommissionReceivedChanged] || [CommissionPaidChanged] || [PremiumChanged]

If you continue to have issues, then you can put a data viewer on the pipeline to find the conditions that are not evaluating as expected.

Upvotes: 2

Related Questions