not_a_soccermom
not_a_soccermom

Reputation: 31

SSIS need to add same rownumber to rows, then reset when it finds a certain value in a row, from file source

I have a file source where the data is not in normalized form with any sort of primary key value or repeating group value. I am using Merge Join to put the multiple rows into one merged row. I need to apply some row numbering so that I have a join between the multiple rows, to get them into the one single row for the merge join.

Here is what the source data looks like:

Data Rows:
MSH|BLAH|||BLAHBLAH15|BLAHZ|||
EVN|MOREBLAH|BLAHBLAH11|BLAHY|||
PID|BLAHXX|BLAHBLAH655|BLAHX|||
PV1|BLAHX2|BLAHBLAH42|BLAHX|||||||||
DG1|1||84|XXXX||A
IN1|1||11400|TEST
IN1|2||20100|TEST2
MSH|BLAH2|BLAHBLAH5|BLAHZ|||
EVN|BLAH6|20220131123100
PID|BLAHGG|BLAH222|BLAHX|||
PV1|PV1|BLAHX2|BLAHBLAH42|BLAHX||||||||20220101|
DG1|1||84|XXXX||A
DG1|2||84|XXXX||A
IN1|1||11600|TEST2

What is consistent is that there is always an MSH line as the header, and everything below it belongs with the MSH line at the top.

So I'm trying to accomplish this by applying a row numbering as below, where it goes from 1,1,1,1 to 2,2,2,2,2 incrementing by one wherever it finds the MSH line, as per below:

Data Rows:                            Numbering Needed:
MSH|BLAH|||BLAHBLAH15|BLAHZ|||        1
EVN|MOREBLAH|BLAHBLAH11|BLAHY|||      1
PID|BLAHXX|BLAHBLAH655|BLAHX|||       1
PV1|BLAHX2|BLAHBLAH42|BLAHX|||||||||  1
DG1|1||84|XXXX||A                     1
IN1|1||11400|TEST                     1
IN1|2||20100|TEST2                    1
MSH|BLAH2|BLAHBLAH5|BLAHZ|||          2
EVN|BLAH6|20220131123100              2
PV1|PV1|BLAHX2|BLAHBLAH42|BLAHX|||||| 2
DG1|1||84|XXXX||A                     2
DG1|2||84|XXXX||A                     2
IN1|1||11600|TEST2                    2

I can't use a specific row count to reset the number, ie: Every 5 rows increment the row numbering, because it's an inconsistent number of rows each time. In the example above, the first set is 7 rows and the 2nd set is 6 rows. I have to do my incrementing by the presence of the "MSH" row value, and apply the same number on down until it finds the next "MSH". I know that have to use a script task (preferably in C#) to generate this row number since my source is a file. But I just can't seem to find the right logic that will do this, since my data doesn't have a repeating key for each row that I can partition by.

Upvotes: 1

Views: 271

Answers (1)

KeithL
KeithL

Reputation: 5594

This is what I would do to meet your requirements:

  1. Read entire row in a flat file viewer

  2. Go into a script task (source). I forgot to mention to add the row as read only.

  3. Set up an output for each type.

  4. Go into the code.

  5. Set up a variable outside of main processing (in startup)

         int counter = 0;
    
  6. In main row processing:

         string[] details = Row.Split('|');
         switch(details[0])
         {
             case "MSH":
                 counter++; //increment counter 
                 OutputBufferMSH.AddRow();
                 OutputBufferMSH.Counter = counter;
                 OutputBufferMSH.Col1 = details[1];
                 // Repeat for each detail Column
                 break;
             case "EVN":
                 OutputBufferEVN.AddRow();
                 OutputBufferEVN.Counter = counter;
                 OutputBufferEVN.Col1 = details[1];
                 // Repeat for each detail Column
                 break;
             //Repeat for each row type
         }
    

I personally would not use this counter approach but actually load the MSH row and return the identity column to replace the counter.

Honestly, I would do the whole thing in a console application instead and use a StreamReader to load the flatfile. Readlines and then use the above logic to push the data into DataTables and use a Bulk Insert to load the data. But the above is the solution to do this in SSIS.

There is a lot to unpack here if you are not familiar with C# or the script task object itself.

Upvotes: 1

Related Questions