ocean800
ocean800

Reputation: 3727

SSIS - Fail if any row in the source dataset has missing values?

Lets say that I have the following data:

FirstName | LastName | Age | Date Added to Database (derived column) 
John      | Doe      | 23  | dd/mm/yyyy
Jane      | Doe      | 31  | dd/mm/yyyy
Bob       |          | 40  | dd/mm/yyyy
Janet     | Smith    |     | dd/mm/yyyy

And I would like to bring it in from an Excel/Flat File source to Sql Server. However, I would like the task to fail completely if any row in the source dataset is missing a value. So far I've seen tutorials where we can do something like this:

enter image description here

Where records with null values are split if they are containing null values. However, I don't want to insert ANY values at all if any of the values are null, rather I would want to do something like send an email on failure notifying that values are missing.

Is this possible in SSIS? Could it be done with a staging table, etc?

Upvotes: 2

Views: 1109

Answers (3)

Roger Wolf
Roger Wolf

Reputation: 7692

You can definitely utilise a staging table as an intermediate point in your data flow, and then validate whatever logic is required in SQL before moving it further. This approach is pretty simple and has an added benefit that, whenever your process fails, your staging table will contain all the data from the latest file, so investigating it is easier.

All alternatives I can think of are much worse in terms of performance and / or resources required, and aren't generally used. For example, it is theoretically possible to load the file data via the Script Task source into an in-memory ADO recordset, checking the values in the script as you send them to the output buffer, and throwing an exception if anything is wrong with the data. If this data flow doesn't fail, then the next step might load the data from that recordset into an actual table.

(Disclaimer: The aforementioned alternative is so awkward and inefficient that should never be considered a viable approach. I don't know where to start to describe its disadvantages. Personally, I might consider something like ths only if my actual goal is "screw things up as much as I can without giving my employer a formal casus belli", which I never do.)

Upvotes: 3

billinkc
billinkc

Reputation: 61221

Nothing out of the box directly supports this scenario but I can think of a few approaches you can cobble together to achieve your request.

The cleanest/easiest to implement is just stuff all your data into a staging table and since I'm lazy, I'd add a computed column to the table definition that flags a row as having a null somewhere.

ALTER TABLE dbo.stageData 
ADD HasNull AS (CASE WHEN Col1 IS NULL OR Col2 IS NULL THEN 'Y' ELSE 'N');

You'd then add two Execute SQL Tasks after the data flow. The first sends email (assuming you have msdb.dbo.sp_send_dbmail set up) I'd do something

IF EXISTS (SELECT * FROM dbo.StageData AS SD WHERE SD.HasNull = 'Y')
BEGIN
    EXECUTE msdb.dbo.sp_send_db_mail @recipient = 'ocean800', @subject = 'Null detected';
END 

I'd then have an Execute SQL Task that snaps the data from stage into the final table. Double check my logic here as I haven't coded this up. It might need to be ANY or just repeat the send mail logic and have the insert as the ELSE clause. Hell, that might be simpler...

INSERT INTO dbo.Final SELECT * FROM dbo.StageData AS SD WHERE NOT EXISTS (SELECT * FROM dbo.StageData AS SDI SDI.HasNull ='Y');

The approach without a stage table would be to double process your file. The first Data Flow will be Flat File Source to Conditional Split and then... whatever really. I'd go easy and use a Row Count transformation which will fill an SSIS variable you create called RowsNull. After the data flow runs, you'll have a value of zero (good to load) or greater than zero - send email.

Bifurcate the Data Flow task into Data Flow Task which actually stores to our table and Notification (Send Email Task, Execute SQL Task to send email, whatever). The trick here is to modify the Precedent Constraint between the parent task (Data Flow) and the child paths. You'll set it to Expression and Constraint. Constraint remains Successful and constraints are @[User::RowsNull] == 0 and @[User::RowsNull] > 0

Other less good methods could be to specify that the OLE DB Destination retains the default settings (Rows per batch is blank and max insert commit size is max int) That setting is an all or nothing commit - you just need something to blow up the implicit transaction so on your Conditional Split for the HasNull path, route rows to a Script Task that raise an Error event. That'll kill the load. Or define your target table with NOT NULL specifier on those columns and then data will load or not and you simply need to add an Email task of your preference to the Failure precedent constraint of the Data Flow to notify when the load failed.

Upvotes: 4

Chris Albert
Chris Albert

Reputation: 2507

This can be done with a staging table. For your sample data you would use the DDL below.

The trick is to have a NOT NULL constraint on each column. Further reading about the constraint can be found in this blog post on SQLShack.

When SSIS attempts to insert a NULL value it will fail. Not the most elegant solution but does what you asking without much overhead.

CREATE TABLE Staging
(
    FirstName VARCHAR(50) NOT NULL
    , LastName VARCHAR(50) NOT NULL
    , Age INT NOT NULL
    , DateAdded DATETIME NOT NULL
)

Upvotes: 1

Related Questions