user1389739
user1389739

Reputation: 111

Concatenate Values from multiple rows in SSIS

I have a file source where the data looks like this:

ID BarcodeNumber
------------------------
1  123456789    
1  33    
2  987654321    
2  44

I would like to get the data as follows:

ID BarcodeNumber
------------------------
1  12345678933    
2  98765432144

Any suggestions on how to accomplish this?

EDIT: I would like to do this in the data flow as this could potentially have billions of rows.

Upvotes: 4

Views: 10796

Answers (2)

Anoop Verma
Anoop Verma

Reputation: 1505

Data flow

Source File:

ID,RowNum,Barcode   
1,1,123456789  
1,2,33    
2,1,987654321    
2,2,44

Split the rows of the file. For the simple case as assumed in the source file here, we can use the conditional split. for more involved situation, we can use Script component as the splitter. From here on, the solution would be pretty standard wrt sort and merge. Once merged, add a derived column: TRIM(Barcode1) + TRIM(Barcode2) where Barcode1 is coming from the "left side" which has rows with RowNum 1, and Barcode2 from the "right side". If you are concerned about the performance of sort component, put the split rows into two tables, and compare the performance.

Cond. Split: RowNum1 (RowNum == "1"), RowNum2 (RowNum == "2")
Sort: By ID; Pass through: Barcode
Merge Join: Left outer join, Join key: ID; Output: ID, Barcode1 and Barcode2
Derived column: TRIM(Barcode1) + TRIM(Barcode2)

Upvotes: 4

billinkc
billinkc

Reputation: 61201

The desired state is that for each ID, all of the barcodes will be concatenated together.

To accomplish this, your data must be sorted by ID. If the file isn't sorted already, then you'll need to run it through the Sort task.

Once you have sorted data, then you will need to add a Script Transformation in Asynchronous mode. It'll be async since the number of rows in will not be the same number as rows out.

The psuedo code will be something like

If Row.ID <> LastID
    If LastID <> ""
        Output0Buffer.ID = LastID
        Output0Buffer.Barcodes = Barcodes
    LastID = Row.ID
    Barcodes = ""

Barcodes = Barcodes + Row.Barcodes

Caveats

  • The above code is neither C# or VB.NET but the logic should be sound
  • If you have billions of rows, you are going to need some serious RAM on this machine as SSIS is an in-memory transformation system.
  • The Sort operation and this script task are going to be slow and memory intensive as they cannot re-use memory space

Upvotes: 6

Related Questions