dotnetN00b
dotnetN00b

Reputation: 5131

How do I validate data in a file in SSIS before inserting into a database?

What I want to do is take data from a dbf file and insert it in a table. Which I've already done. Since there are many files, a For-Each Container is being used. However, before inserting it into a table, I want to look at the date fields and compare it to a date variable. If the dates match the variable, then move on to the step of the flow. But if any of the dates don't match the variable, then that file and its contents are discarded and the next file is looked at.

How do I accomplish this in SSIS?

Upvotes: 0

Views: 2605

Answers (1)

billinkc
billinkc

Reputation: 61269

You're looking for the Conditional Split Component within your Data Flow Task.

Assuming your source column is MyDate and you have an SSIS Variable called @[User::ReferenceDate] then you'd apply an expression like

[MyDate] == @[User::ReferenceDate]

That will evaluate to True when the dates match, false otherwise.

In your Conditional Split, add a row into the component.

  • OutputName: DatesMatched
  • Condition: [MyDate] == @[User::ReferenceDate]
  • Default output name: DatesUnmatched

Now when you connect the output from this to your destination, it'll ask whether you want to route the data using the DatesMatched or DatesUnmatched path. Use the DatesMatched path.

As I re-read this, if any of the dates don't match the variable, then that file and its contents are discarded then you're looking at double processing the file. The first time to read it all in and validate it. The second time, optional, will actually load to the database.

From your Conditional Split, add a RowCount to the DatesUnmatched path. Use a Variable of type Integer/Int32 named CountDatesUnmatched. In a perfect world, that will be zero when the validation of the file completes.

In the Precedent Constraint between the Validation Data Flow and the actual Import Data Flow, double click the connector line and change the evaluation criteria from Constraint to Expression and Constraint. Leave the value as Success and in the Expression use @[User::CountDatesUnmatched] == 0 That data flow will only light up if both conditions are true: parsing was successful and no rows were sent to the Row Count component.

Finally, you can cheat and sometimes this approach makes sense. If you're using an OLE DB Destination, then you can use the MaximumInsertCommitSize of the default 2B and a data access mode of fast load. This translates to "Everything is going to commit or none of it is". That can lock up your target table and cause your transaction log to grow heavily depending on how much data you're loading. Use the Conditional Split as described above but for the DatesUnmatched path, induce a failure. A Derived column with divide by zero or a script task with an explicit FireError event will cause that transaction to go belly up. You'd need to do some magic in the OnError event handler to not abort the overall file processing but it's a lazy hack (or one that is useful when double reading the file is prohibitive but impacting the database is less so)

Upvotes: 2

Related Questions