MsCr3ole
MsCr3ole

Reputation: 45

Split a single record into multiple records in SSIS

I want to know how can one split a single field into multiple rows/line in SSIS?

I have a field called Comments that can hold up to 1240 characters. Each line can only hold up to 60 characters each. I need those rows to have a record ID of SA - ST depending on how long the comments are. Current record:

Comments
In the name of Robert of the House Baratheon, the First of his Name, King of the Andals and the Rhoynar and the First Men, Lord of the Seven Kingdoms and Protector of the Realm, by the word of Eddard of the House Stark, Lord of Winterfell and Warden of the North, I do sentence you to die.

I want an output from SSIS like the below:

SA In the name of Robert of the House Baratheon, the First of 
SB his Name, King of the Andals and the Rhoynar and the First 
SC Men, Lord of the Seven Kingdoms and Protector of the Realm,
SD  by the word of Eddard of the House Stark, Lord of Winterfe
SE  ll and Warden of the North, I do sentence you to die.

Upvotes: 0

Views: 354

Answers (1)

billinkc
billinkc

Reputation: 61201

I put together a quick C# implementation of what I think you're looking for (and can easily be changed to SSIS specific). There's a trivial difference in line SE You have a leading space before ll and I do not but I assume that's an artifact of your framing of the question.

Tester solution

// Tester solution for https://stackoverflow.com/q/65126044/181965
using System;
using System.Collections;
using System.Collections.Generic;
                    
public static class Extensions
{
    // https://stackoverflow.com/questions/3008718/split-string-into-smaller-strings-by-length-variable
    public static IEnumerable<string> SplitByLength(this string str, int maxLength) 
    {
        for (int index = 0; index < str.Length; index += maxLength) 
        {
            yield return str.Substring(index, Math.Min(maxLength, str.Length - index));
        }
    }   
}   
public class Program
{
    public static void Main()
    {
        // Simulate Row value
        string comments = "In the name of Robert of the House Baratheon, the First of his Name, King of the Andals and the Rhoynar and the First Men, Lord of the Seven Kingdoms and Protector of the Realm, by the word of Eddard of the House Stark, Lord of Winterfell and Warden of the North, I do sentence you to die.";
        
        // Given the above, the expectation is that we will break the string into 60 character length strings with S A-T preceding the line. Expected output
/*
SA In the name of Robert of the House Baratheon, the First of 
SB his Name, King of the Andals and the Rhoynar and the First 
SC Men, Lord of the Seven Kingdoms and Protector of the Realm,
SD  by the word of Eddard of the House Stark, Lord of Winterfe
SE  ll and Warden of the North, I do sentence you to die.
*/
        
        // Define our line length
        int lineLength = 59;
        
        // 65 is A in ASCII
        int commentEnumerator = 65;
        
        // Split comments into lineLength segments
        foreach (string line in comments.SplitByLength(lineLength))
        {
            // https://stackoverflow.com/questions/289792/int-to-char-in-c-sharp
            Console.WriteLine(string.Format("S{0} {1}", (char)commentEnumerator, line));
            commentEnumerator++;
        }
/*
SA In the name of Robert of the House Baratheon, the First of h
SB is Name, King of the Andals and the Rhoynar and the First Me
SC n, Lord of the Seven Kingdoms and Protector of the Realm, by
SD  the word of Eddard of the House Stark, Lord of Winterfell a
SE nd Warden of the North, I do sentence you to die.
*/
    }
    
}

Now that you have a working example to test your logic against, we need to port this into SSIS speak.

I need to have a Data Flow on my Control Flow. I am going to feed Comments into the pipeline (data flow) via an OLE DB Source which then routes to an Script Component. The only "tricky" bit is that the default Transformation mode of Synchronous won't "work" here as that assumes 1 row input yields 1 row of output.

OLE DB Source

I specify a query here and use the following

SELECT CAST('In the name of Robert of the House Baratheon, the First of his Name, King of the Andals and the Rhoynar and the First Men, Lord of the Seven Kingdoms and Protector of the Realm, by the word of Eddard of the House Stark, Lord of Winterfell and Warden of the North, I do sentence you to die.' AS varchar(1240)) AS Comments

Script Component

This will act as a Transformation (default).

On the Input Columns tab, check Comments and ReadOnly is fine

Input Columns tab, column Comments is checked and flagged as ReadOnly

On the Inputs and Outputs tab, select Output 0 and expand it. We need to change the property SynchronousID from the default Script Components.Inputs[Input 0] to None

Same as described in text but a big red arrow pointing at SynchronousID

Before you leave this tab, you need to specify the column(s) that will hold the output data. I presume we are creating two columns - RowId and Comment which I will define as strings of length 2 and 159 respectively. Click Add Column two times and then modify the Name, DataType and Length properties.

Future readers, for extra credit, you can define an SSIS variable of type Int32 and populate it with the lineLength and then add it to ReadOnly variables on the Script tab for a semi-flexible solution.

Click Edit Script.

using System;
using System.Collections.Generic;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

public static class Extensions
{
    // https://stackoverflow.com/questions/3008718/split-string-into-smaller-strings-by-length-variable
    public static IEnumerable<string> SplitByLength(this string str, int maxLength)
    {
        for (int index = 0; index < str.Length; index += maxLength)
        {
            yield return str.Substring(index, Math.Min(maxLength, str.Length - index));
        }
    }
}

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    int lineLength;

    public override void PreExecute()
    {
        base.PreExecute();
        // This is where I would access the SSIS variable if I went that route
        lineLength = 59;
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // 65 is A in ASCII
        int commentEnumerator = 65;
        foreach (string line in Row.Comments.SplitByLength(lineLength))
        {
            // https://stackoverflow.com/questions/289792/int-to-char-in-c-sharp
            Output0Buffer.AddRow();
            Output0Buffer.RowId = "S" + (char)commentEnumerator;
            Output0Buffer.Comment = line;
            commentEnumerator++;
        }
    }
}

And this is the results of that operation

Data viewer showing the SSIS version of the C# results

Upvotes: 3

Related Questions