Sachin
Sachin

Reputation: 1

How do I check whether all the rows for a column are NULL in SSIS?

I'm working on a SSIS package where I have a text file with 5 columns. I need to check if all the rows for 5th column are NULL values. If all the rows in 5th column are NULL then all the data should go for invalid file table. If any row in 5th column have non NULL value then all the data should go to valid table.

Upvotes: 0

Views: 1470

Answers (3)

userfl89
userfl89

Reputation: 4790

You can check if the file is empty with C# using an OleDbDataAdapter and search the file, then determine where to load the file using SSIS Precedence Constraints. This example uses a CSV file without column names. If the columns do have names add the replacement code noted in the comments below. You will also need the using statements listed.

  • Add an SSIS Boolean variable. This is IsColumnNull in the following example. Next add a C# Script Task with IsColumnNull variable in the ReadWriteVariables field, and (optionally) a variable holding the file path ReadOnlyVariables pane.
  • Next set Precedence Constraints to check for both a true condition (has null rows) or false condition (does not have null records). Since the IsColumnNull variable is a Boolean, use just the variable itself as the expression to check for all null rows, but add ! for non-nulls, i.e. !@[User::IsColumnNull].
  • Connect the appropriate Data Flow Tasks with each destination table to the corresponding Precedence Constraint. For example, add the Data Flow Task with the "invalid file table" as the destination after the Precedence Constraint checking for a true value in the IsColumnNull variable.

enter image description here

Precedence Constraint For Rows with Nulls:

enter image description here

Precedence Constraint for Rows without Nulls:

enter image description here

Script Task Example:

using System;
using System.Data;
using System.IO;
using System.Data.OleDb;
using System.Linq;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;


        string fullFilePath = Dts.Variables["User::FilePath"].Value.ToString();
        string fileName = Path.GetFileName(fullFilePath);
        string filePath = Path.GetDirectoryName(fullFilePath);

        string connStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filePath
            + ";Extended Properties=\"text;HDR=No;FMT=Delimited\";";

        //add filter for NOT NULL on given column to only return non-nulls
        string sql = "SELECT F2 FROM " + fileName + " WHERE F2 IS NOT NULL";


        //if file has column names replce "connStr" and "sql" as shown below
        /*
        string connStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filePath
            + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited\";";

        string sql = "SELECT ID FROM " + fileName + " WHERE ID IS NOT NULL";
        */

        using (OleDbDataAdapter oleAdpt = new OleDbDataAdapter(sql, connStr))
        {
            DataTable dt = new DataTable();
            oleAdpt.Fill(dt);

            //if emtpy set IsColumnNull SSIS variable to true
            if (dt.Select().Count() < 1)
            {
                Dts.Variables["User::IsColumnNull"].Value = true;
            }
            else
            {
                Dts.Variables["User::IsColumnNull"].Value = false;
            }
        }

Upvotes: 0

KeithL
KeithL

Reputation: 5594

personally if i had to perform this task I would use a script task to do it all:

  1. Load into a data table
  2. Use linq to check column to determine destination .Where(x => x[4]!=null).Count()
  3. Load to destination via bulk Copy

Upvotes: 0

Ockert
Ockert

Reputation: 443

You will need to read the entire file before being able to make the decision of where to write it to, so introduce a third table where you can stage the data first

Next part would be to build the logic that checks the staging table for all NULLS. Below query would return 0 if all was NULL and more than 0 if any record had a value

SELECT COUNT(*) FROM dbo.StagingTable ST WHERE ST.Column5 IS NOT NULL

Once you feed the answer into a variable you can use precedence constraints to fire the dataflow copy [staging to active] if the result was more than 0 or [staging to faulty] if the result was 0

Upvotes: 1

Related Questions