Reputation: 6530
I have created a SSIS package
In data flow task I have few columns
Now I am converting data type of column 0 and column 3 to int, intentionally to create an error:
Now I have configured error as:
After that I have kept a data viewer and execute the package, no surprise I got the expected error output:
I can see that I have got 3 more columns
But my questions are:
Upvotes: 1
Views: 2299
Reputation: 5366
This is easy in SS2016 and above: https://www.mssqltips.com/sqlservertip/4066/retrieve-the-column-causing-an-error-in-sql-server-integration-services/
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);
IDTSComponentMetaData130 componentMetaData = this.ComponentMetaData as IDTSComponentMetaData130;
Row.ErrorColumnName = componentMetaData.GetIdentificationStringByID(Row.ErrorColumn);
}
For anyone using SQL Server versions before SS2016, here are a couple of reference links for a way to get the Column name:
which is based on: http://toddmcdermid.blogspot.com/2016/04/finding-column-name-for-errorcolumn.html
I appreciate we aren't supposed to just post links, but this solution is quite convoluted, and I've tried to summarise by pulling info from both Todd and Andrew's blog posts and recreating them here. (thank you to both if you ever read this!)
From Todd's page:
- Go to the "Inputs and Outputs" page, and select the "Output 0" node. Change the "SynchronousInputID" property to "None". (This changes the script from synchronous to asynchronous.)
- On the same page, open the "Output 0" node and select the "Output Columns" folder. Press the "Add Column" button. Change the "Name" property of this new column to "LineageID".
- Press the "Add Column" button again, and change the "DataType" property to "Unicode string [DT_WSTR]", and change the "Name" property to "ColumnName".
- Go to the "Script" page, and press the "Edit Script" button. Copy and paste this code into the ScriptMain class (you can delete all other method stubs):
public override void CreateNewOutputRows() {
IDTSInput100 input = this.ComponentMetaData.InputCollection[0];
if (input != null)
{
IDTSVirtualInput100 vInput = input.GetVirtualInput();
if (vInput != null)
{
foreach (IDTSVirtualInputColumn100 vInputColumn in vInput.VirtualInputColumnCollection)
{
Output0Buffer.AddRow();
Output0Buffer.LineageID = vInputColumn.LineageID;
Output0Buffer.ColumnName = vInputColumn.Name;
}
}
} }
Feel free to attach a dummy output to that script, with a data viewer, and see what you get. From here, it's "standard engineering" for you ETL gurus. Simply merge join the error output of the failing component with this metadata, and you'll be able to transform the ErrorColumn number into a meaningful column name.
But for those of you that do want to understand what the above script is doing:
- It's getting the "first" (and only) input attached to the script component.
- It's getting the virtual input related to the input. The "input" is what the script can actually "see" on the input - and since we didn't mark any columns as being "ReadOnly" or "ReadWrite"... that means the input has NO columns. However, the "virtual input" has the complete list of every column that exists, whether or not we've said we're "using" it.
- We then loop over all of the "virtual columns" on this virtual input, and for each one...
- Get the LineageID and column name, and push them out as a new row on our asynchronous script.
The image and text from Andrew's page helps explain it in a bit more detail:
This map is then merge-joined with the ErrorColumn lineage ID(s) coming down the error path, so that the error information can be appended with the column name(s) from the map. I included a second script component that looks up the error description from the error code, so the error table rows that we see above contain both column names and error descriptions.
The remaining component that needs explaining is the conditional split – this exists just to provide metadata to the script component that creates the map. I created an expression (1 == 0) that always evaluates to false for the “No Rows – Metadata Only” path, so no rows ever travel down it.
Whilst this solution does require the insertion of some additional plumbing within the data flow, we get extremely valuable information logged when errors do occur. So especially when the data flow is running unattended in Production – when we don’t have the tools & techniques available at design time to figure out what’s going wrong – the logging that results gives us much more precise information about what went wrong and why, compared to simply giving us the failed data and leaving us to figure out why it was rejected.
Upvotes: 1