Wichie Artu
Wichie Artu

Reputation: 125

In SSIS, How to lookup ID for value, if no match, insert value and return ID

With SSIS, I am loading a staged extract into its master database. Part of the process is using LOOKUP to get the ID of a string value, and if it doesn't exist, insert it into the reference table (with OLE DB COMMAND) and return that ID through a followed lookup.

*SSIS Screenshot*

The LOOKUP says no match for the first ~50 values even after inserted. Is there a way I can fix this? Or, is there another approach I should try?

Some research/attempts...

  1. Originally, no matches were being returned. I had to change all the lookups to "No Cache" so it would reload the reference table (they are very small, so I doubt I'll have big performance problems).

  2. An alternative I've seen is multicast into different lookups (based on columns I'm looking up), and merge join them back together after I've inserted/looked them successfully. But from what I've read, there are heavy performance issues with this approach.

I expect the lookup to match or not match.If not match, insert new value; then, lookup value again for ID (if no match, fail). Union both original matched and original not matched to do next lookup.

Upvotes: 2

Views: 2576

Answers (2)

Wichie Artu
Wichie Artu

Reputation: 125

SSIS Data Flow

Here is the final SSIS data flow model... Per @KeithL suggestion, I added sorts before the merge joins.

Script Component

using System;
using System.Data;
using System.Data.SqlClient;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

public class ScriptMain : UserComponent
{
    IDTSConnectionManager100 serverConMgr;

    public override void PreExecute()
    {
        base.PreExecute();
        serverConMgr = this.Connections.Connection;
    }
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        using (SqlConnection serverCon = new SqlConnection(serverConMgr.ConnectionString))
        {
            SqlCommand sql = new SqlCommand(
            "INSERT INTO DIM.CONTRIBUTOR (CONTRIBUTOR) VALUES ('" + Row.CONTRIBUTOR + "') " +
            "\n SELECT CAST(SCOPE_IDENTITY() AS INT) AS 'ID'"
            , serverCon);

            serverCon.Open();

            SqlDataReader data = sql.ExecuteReader();

            while (data.Read())
            {
                Row.CONTRIBUTORID = data.GetInt32(0);
            }

            data.Close();
        }
        
    }
}

The "script component" is the trickiest part. It inserts the new value and retrieves the corresponding ID. Below is the code for that script.

Note 3 important things...

  1. Database Connection manager (in this case, named "Connection", MUST be ADO.Net connection)
  2. Row.CONTRIBUTOR_ID is a row I manually configured as an output in the "Inputs and Outputs" section of the script editor
  3. "GetInt32()" method for SqlDataReader does not work with SCOPE_IDENTITY unless SCOPE_IDENTITY is explicitly cast as int in the sql command

Here are some other notes related to the script...

  1. You CAN insert and return SCOPE_IDENTITY through OLE DB command, but it's fairly complicated
  2. Here is the Microsoft Document reference to how to retrieve data using SqlDataReader

Upvotes: 1

KeithL
KeithL

Reputation: 5594

Try this pattern:

enter image description here

1.You are multicasting the dataset and grouping by just your lookup column
2.inserting your lookup and adding the scopeidentity(lookupid) to you grouped data
3.joining in the data you inserted to effective add your lookup value to every row
4.unioning your data back in

I am not positive, but you may need to add sorts to use merge join.

Upvotes: 1

Related Questions