Reputation: 31
I have a file source where the data is not in normalized form with any sort of primary key value or repeating group value. I am using Merge Join to put the multiple rows into one merged row. I need to apply some row numbering so that I have a join between the multiple rows, to get them into the one single row for the merge join.
Here is what the source data looks like:
Data Rows:
MSH|BLAH|||BLAHBLAH15|BLAHZ|||
EVN|MOREBLAH|BLAHBLAH11|BLAHY|||
PID|BLAHXX|BLAHBLAH655|BLAHX|||
PV1|BLAHX2|BLAHBLAH42|BLAHX|||||||||
DG1|1||84|XXXX||A
IN1|1||11400|TEST
IN1|2||20100|TEST2
MSH|BLAH2|BLAHBLAH5|BLAHZ|||
EVN|BLAH6|20220131123100
PID|BLAHGG|BLAH222|BLAHX|||
PV1|PV1|BLAHX2|BLAHBLAH42|BLAHX||||||||20220101|
DG1|1||84|XXXX||A
DG1|2||84|XXXX||A
IN1|1||11600|TEST2
What is consistent is that there is always an MSH line as the header, and everything below it belongs with the MSH line at the top.
So I'm trying to accomplish this by applying a row numbering as below, where it goes from 1,1,1,1 to 2,2,2,2,2 incrementing by one wherever it finds the MSH line, as per below:
Data Rows: Numbering Needed:
MSH|BLAH|||BLAHBLAH15|BLAHZ||| 1
EVN|MOREBLAH|BLAHBLAH11|BLAHY||| 1
PID|BLAHXX|BLAHBLAH655|BLAHX||| 1
PV1|BLAHX2|BLAHBLAH42|BLAHX||||||||| 1
DG1|1||84|XXXX||A 1
IN1|1||11400|TEST 1
IN1|2||20100|TEST2 1
MSH|BLAH2|BLAHBLAH5|BLAHZ||| 2
EVN|BLAH6|20220131123100 2
PV1|PV1|BLAHX2|BLAHBLAH42|BLAHX|||||| 2
DG1|1||84|XXXX||A 2
DG1|2||84|XXXX||A 2
IN1|1||11600|TEST2 2
I can't use a specific row count to reset the number, ie: Every 5 rows increment the row numbering, because it's an inconsistent number of rows each time. In the example above, the first set is 7 rows and the 2nd set is 6 rows. I have to do my incrementing by the presence of the "MSH" row value, and apply the same number on down until it finds the next "MSH". I know that have to use a script task (preferably in C#) to generate this row number since my source is a file. But I just can't seem to find the right logic that will do this, since my data doesn't have a repeating key for each row that I can partition by.
Upvotes: 1
Views: 271
Reputation: 5594
This is what I would do to meet your requirements:
Read entire row in a flat file viewer
Go into a script task (source). I forgot to mention to add the row as read only.
Set up an output for each type.
Go into the code.
Set up a variable outside of main processing (in startup)
int counter = 0;
In main row processing:
string[] details = Row.Split('|');
switch(details[0])
{
case "MSH":
counter++; //increment counter
OutputBufferMSH.AddRow();
OutputBufferMSH.Counter = counter;
OutputBufferMSH.Col1 = details[1];
// Repeat for each detail Column
break;
case "EVN":
OutputBufferEVN.AddRow();
OutputBufferEVN.Counter = counter;
OutputBufferEVN.Col1 = details[1];
// Repeat for each detail Column
break;
//Repeat for each row type
}
I personally would not use this counter approach but actually load the MSH row and return the identity column to replace the counter.
Honestly, I would do the whole thing in a console application instead and use a StreamReader to load the flatfile. Readlines and then use the above logic to push the data into DataTables and use a Bulk Insert to load the data. But the above is the solution to do this in SSIS.
There is a lot to unpack here if you are not familiar with C# or the script task object itself.
Upvotes: 1