graphene
graphene

Reputation: 157

duplicate set key value in SSIS

I need to pivot rows to columns in SSIS. I am using Integration Services in the Microsoft Visual Studio version 2010.

I have a flat file with the following info:

column 0       column1     column2
-------------------------------------
d-5454-s34    name          Frans
d-5454-s34    sd            xyh
d-5454-s34    description   Group zen
d-5454-s34    member        xxxx
d-5454-s34    member        yyyy
d-5454-s34    member        zzzzz
d-5454-s34    member        uuuuu
d-5454-s45    name          He-man
d-5454-s45    sd            ygh
d-5454-s45    description   Group Comics
d-5454-s45    member        eeee
d-5454-s45    member        ffffff
e-3434-t45    name          Calvin
e-3434-t45    sd            trdg

and the final output should be

id                name          sd      description          member
---------------------------------------------------------------------------

d-5454-s34        Frans         xyh      Group zen       xxxx; yyyy; zzzzz; uuuuu

d-5454-s45        He-man        ygh      Group Comics    eeee; ffffff

e-3434-t45        Calvin        trdg     NULL                NULL

I have used the flat file component and the result is the same as you see BEFORE the final output (check above).

If I setup with the pivot component in SSIS as follows: I set the PIVOT KEY as column 1 (it contains rows Name, sd, description and member - this last is repeated....) , the SET KEY as column 0 as we have the id that should not be repeated. :) and finally the pivot value as column 2. Afterwards I have set pivot output columns as C_NAME, C_sd, C_description, C_member... but as member is repeated in several rows it is throwing this error... Duplicate key value "member" ... how to overcome this?

Just to test i have deleted all remaining Members leaving only one member, in this way it works. Now I need to get a way to aggregate the several rows with MEMBER duplicated (column 0). How to use the aggregate function of SSIS to group only the member in column 1 and connecting all the different values for member in column 2 separated by ; as shown in the last table. Thank you.

[ssis[1]

ssis 2

error

Upvotes: 0

Views: 887

Answers (2)

Kobi
Kobi

Reputation: 2524

SSIS provides a lot of transformations, but most of time, insert data into a temp table and write a simple query can save a lot of time and performance may be better.

for example:

with #tempTable as (
select 'd-5454-s34' column0, 'name' column1, 'Frans' column2
union all select 'd-5454-s34', 'sd ', 'xyh'
union all select 'd-5454-s34', 'description', 'Group zen'
union all select 'd-5454-s34', 'member', 'xxxx'
union all select 'd-5454-s34', 'member', 'yyyy'
union all select 'd-5454-s34', 'member', 'zzzzz'
union all select 'd-5454-s34', 'member', 'uuuuu'
union all select 'd-5454-s45', 'name', 'He-man'
union all select 'd-5454-s45', 'sd', 'ygh '
union all select 'd-5454-s45', 'description', 'Group Comics'
union all select 'd-5454-s45', 'member', 'eeee'
union all select 'd-5454-s45', 'member', 'ffffff'
union all select 'e-3434-t45', 'name', 'Calvin'
union all select 'e-3434-t45', 'sd', 'trdg'
)
SELECT column0
     , [name]
     , sd
     , description
     , member
FROM   ( SELECT column0,column1, column2 , STUFF(( SELECT '; ' + column2 
                                                     FROM #tempTable T1 
                                                    WHERE T1.column0 = t2.column0 
                                                      AND column1    = 'member' 
                                              FOR XML PATH('') ),1, 1, '') member
          FROM   #tempTable t2 ) t
PIVOT ( MAX(t.column2) FOR t.column1 IN ([name], sd, description)) AS pivotable 

Upvotes: 1

Abhishek
Abhishek

Reputation: 2490

You would need to change your approach a bit and transform (aggregate) your data before you are actually doing the pivot operation.

Built a sample package to demonstrate the solution - As per the package the data needs to be sorted first as the job would be comparing records with each other. Next we need a script component (type transformation). Select all the required input and create the necessary output columns. The data type of the output columns would be same as input just make sure to increase the size of the last column(column3). Also, make sure the script component is asynchronous because it throws out a different number of rows than there are incomming.

Use the below code in script component which would be checking the previous row value and appending the data as a semi-colon separated list of related records.

    bool initialRow = true;     // Indicater for the first row
    string column0 = "";        
    string column1 = "";       
    string column2 = "";

    public override void Input0_ProcessInput(Input0Buffer Buffer)
    {
        // Loop through buffer
        while (Buffer.NextRow())
        {
            // Process an input row
            Input0_ProcessInputRow(Buffer);

            // Change the indicator after the first row has been processed
            initialRow = false;
        }

        // Check if this is the last row
        if (Buffer.EndOfRowset())
        {
            // Fill the columns of the existing output row with values
            // from the variable before closing this Script Component
            Output0Buffer.Column0 = column0;
            Output0Buffer.Column1 = column1;
            Output0Buffer.Column2 = column2;
        }
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if (initialRow)
        {
            // This is for the first input row only

            // Create a new output row
            Output0Buffer.AddRow();

            // Now fill the variables with the values from the input row
            column0 = Row.column0;
            column1 = Row.column1;
            column2 = Row.column2;
        }
        else if ((!initialRow) & ((column0 != Row.column0) || (column1 != Row.column1)))
        {
            // This isn't the first row, but either the column1 or column2 did change

            // Fill the columns of the existing output row with values
            // from the variable before creating a new output row
            Output0Buffer.Column0 = column0;
            Output0Buffer.Column1 = column1;
            Output0Buffer.Column2 = column2;

            // Create a new output row
            Output0Buffer.AddRow();

            // Now fill the variables with the values from the input row
            column0 = Row.column0;
            column1 = Row.column1;
            column2 = Row.column2;
        }
        else if ((!initialRow) & (column0 == Row.column0) & (column1 == Row.column1) & (column1 == "member"))
        {
            // This isn't the first row, and the column (member) did not change

            // Concatenate the studentsname to the variable
            column2 += ";" + Row.column2;
        }
    }

enter image description here

Reference: link

Upvotes: 1

Related Questions