Namkce
Namkce

Reputation: 514

SSIS consolidate and concatenate multiple rows into single rows without using SQL

I am trying to accomplish something that is pretty easy to do in SQL, but seemingly very challenging to do in SSIS without using SQL. Basically, I need to consolidate and concatenate a field of a many-to-one relationship.

Given entities: [Contract Item] (many) to (one) [Account]

There is a field [ari_productsummary] that contains the product listed on the Contract Item entity. We want to write that value to the Account as [ari_activecontractitems]. However, an Account may have more than one Contract Item record associated to it, in which case, we want to concatenate those values. We also only want the distinct values to be concatenated (distinct rows already solved within my data flow).

This can be accomplished by writing to a temporary table, and then using a query or view to obtain the summarized results as followed. I created a SQL table called TESTTABLE that contains the [ari_productsummary] from the Contract Item entity along with the referring [accountid] to map it back to Account. I then wrote the following query as a view:

SELECT distinct accountid,
        (SELECT TT2.ari_productsummary + '; ' 
                FROM TESTTABLE TT2
                WHERE TT2.accountid = TT.accountid
                FOR XML PATH ('')
            ) AS 'ari_activecontractitems'
FROM TESTTABLE TT

Executing that Query provides me the results that I want, which I can then use for importing into the Account entity as shown below:

SSIS data flow, truncate temp sql table, write to sql table, pull from view and write to object

But how do I do this in a SSIS dataflow without writing to a SQL table as a temporary placeholder for the data?? I want to do the entire process inside one dataflow container, without using a temporary SQL table/view. The whole summarization process needs to be done on the fly:

Mock-up of dataflow that I am trying to build out that doesn't require a SQL temp table/view

Does anyone have a solution that doesn't require a temporary SQL table/view/query, but is contained entirely within the data flow?

I am using VS 2017 and the KingswaySoft Dynamic CRM 365 ETL toolset to develop my solution/package.

Upvotes: 1

Views: 1465

Answers (2)

Daniel Cai
Daniel Cai

Reputation: 446

We have an upcoming release that comes with a component that does exactly what you are trying to achieve without the need of writing custom code. The feature is currently under preview, please reach out to us for private access to the feature. You can find our contact information on our website.

UPDATE - June 5, 2020, we have made the components available for public access at https://www.kingswaysoft.com/products/ssis-productivity-pack/ as a result of our 2020 Release Wave 1. We have two components available that serve this kind of purpose. The Composition component will take input values and transform into a composite value in a SSIS column. The Decomposition component does the opposite, it would take an input value and split it into multiple rows using either delimiter-based text splitting or XML/JSON array splitting.

Upvotes: 1

billinkc
billinkc

Reputation: 61269

Spit balling here as I don't Dynamics nor do I have the custom components.

Data Flow 1 - Contract aggregation

The purpose of this data flow is to replicate your logic in the elegant query you provided and shove that into a Cache Connection Manager (see Notes for 2008+ at the end)

KingswaySoft Dynamics Source -> Script Task -> Cache Transform

If you want to keep the sort in there, do it before the script task. The implementation I'll take with the Script Task is that it's fully blocking - that is all the rows must arrive before it can send any on. Tasks like the Merge Join are only partially blocking because the requirement of sorted data means that once you no longer have a match for the current item, you can send it on down the pipeline.

The Script Task is going to be asynchronous transformation. You'll have two output columns, your key accountid and your new derived column of ari_activecontractitems. That column will might need to be big - you'll know your data best but if it's a blob type in Dynamics (> 4k unicode or > 8k ascii characters) then you'll have to define the data type as DT_TEXT/DT_NTEXT

As inputs, you'll select accountid and ari_productsummary from your source.

The code should be pretty easy. We're going to accumulate the inbound data into a Dictionary.

    //  member variable
    Dictionary<string, List<string>> accumulator;

The PreProcess method, we'll tack this in there to initialize our variable

    // initialize in PreProcess method
    accumulator = new Dictionary<string, List<string>>();

In the OnBufferRowSent (name approx)

    // simulate the inbound queue
    // row_id would be something like Rows.row_id
    if (!accumulator.ContainsKey(row_id))
    {
        // Create an empty dictionary for our list
        accumulator.Add(row_id, new List<string>());
    }

    // add it if we don't have it
    if (!accumulator[row_id].Contains(invoice))
    {
        accumulator[row_id].Add(invoice);
    }

Once you get the signal sent of no more data available, that's when you start buffering output data. The auto generated code will have placeholders for all this.

    // This is how we shove data out the pipe
    foreach(var kvp in accumulator)
    {
        // approximately thus
        OutputBuffer1.AddRow();
        OutputBuffer1.row_id = kvp.Key;
        OutputBuffer1.ari_productsummary = string.Join("; ",  kvp.Value);

    }

Upvotes: 1

Related Questions