Jay P
Jay P

Reputation: 13

SSIS - Splitting String into multiple rows

Looking for some advice really. I've got a dataset for school attendance data that comes into me as: student_id, school_id, term_start_date, attendance_marks.

An example line of this data would be:

1234, 1002, 2016-09-01, '/\##/L/\##BB/\/\/\/\/\'

The attendance marks string is basically two sessions per day, each mark corresponds to various different codes. Basically, I get a start date and then have to work out each day's attendance mark going forward from that... horrible I know, but that's how I get the data I'm afraid.

I've written a script object to loop through this string and output a row for each day to get loaded into a data warehouse.

I should say before posting the code that this script works, it does what I need it to do... but it is painfully slow. I'm hoping I can draw upon the collective experience here to see if there is a more efficient way of achieving this?

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    int studentID = Row.STUDDENTID;
    int baseID = Row.SCHOOLID;
    DateTime dateKey = Row.STARTDATE;
    string marks = Row.MARKS.ToString();

    // TODO
    // This is a bodge at the moment to make sure the mark string is always divisible by 2
    // in the production release I'll have to handle this as an exception
    if (marks.Length % 2 != 0)
    {
        marks = marks + '@';
    }

    char[] c = marks.ToCharArray();

    for (int i = 0; i < c.Length; i += 2)
    {
        Output0Buffer.AddRow();
        Output0Buffer.baseID = baseID;
        Output0Buffer.studentID = studentID;
        Output0Buffer.dateKey = dateKey.AddDays(i / 2).Year * 10000 + dateKey.AddDays(i / 2).Month * 100 + dateKey.AddDays(i / 2).Day;
        Output0Buffer.markAM = c[i].ToString();
        Output0Buffer.markPM = c[i + 1].ToString();

        if (i == c.Length - 1)
        {
            base.FinishOutputs();
        }
    }
}

Is there a better way of doing this? Have I massively overthought it or do I just have to live with it taking ages to run? Thanks in advance.

Upvotes: 1

Views: 424

Answers (1)

iamdave
iamdave

Reputation: 12243

This could probably be written in Oracle Pl-SQL, though I neither work with Oracle nor recommend you doing transformations within your live system as part of the extract


If you load your data as is into a SQL Server staging environment, you can then return it in your OLEDB Source component using the script that utilises a derived tally table to split out the Attendance string into one character per row. This will work with multiple students, schools and days (ie: it is a proper set based solution), so can be used as a one pass fetch that will return your entire dataset:

-- Create test data
declare @d table(StudentID int,SchoolID int, AttDate date, Attendance nvarchar(500));
insert into @d values
 (1234, 1002, '20160901', '/\##/L/\##BB/\/\/\/\')
,(1235, 1002, '20160901', '/\##/L/\##/\BB/\/\/\')
,(1234, 1002, '20160902', '/\##/L/\##BB/\/\/\/\/\')
,(1235, 1002, '20160902', '/\##/L/\/\BB/\/\##/\/\');

              -- Create a list of 10 rows
with n(n) as (select n from (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n(n))
              -- Cross join 6 times to reach 1,000,000 numbers if required.  Filtered to longest Attendance string in the SELECT TOP()
    ,t(t) as (select top (select max(len(Attendance)) from @d) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4,n n5,n n6)
select d.StudentID
        ,d.SchoolID
        ,d.AttDate
        ,d.Attendance
        ,t.t as AttCharNum
        ,substring(d.Attendance,t.t,1) as AttChar  -- Use SUBSTRING to retreive just the one character
from @d d
    join t  -- Join to numbers only where they are actually required
        on t.t <= len(d.Attendance)
order by d.StudentID
        ,d.AttDate
        ,t.t;

Output:

+-----------+----------+------------+------------------------+------------+---------+
| StudentID | SchoolID |  AttDate   |       Attendance       | AttCharNum | AttChar |
+-----------+----------+------------+------------------------+------------+---------+
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |          1 | /       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |          2 | \       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |          3 | #       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |          4 | #       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |          5 | /       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |          6 | L       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |          7 | /       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |          8 | \       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |          9 | #       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         10 | #       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         11 | B       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         12 | B       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         13 | /       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         14 | \       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         15 | /       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         16 | \       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         17 | /       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         18 | \       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         19 | /       |
|      1234 |     1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\   |         20 | \       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |          1 | /       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |          2 | \       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |          3 | #       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |          4 | #       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |          5 | /       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |          6 | L       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |          7 | /       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |          8 | \       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |          9 | #       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         10 | #       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         11 | B       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         12 | B       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         13 | /       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         14 | \       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         15 | /       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         16 | \       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         17 | /       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         18 | \       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         19 | /       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         20 | \       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         21 | /       |
|      1234 |     1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ |         22 | \       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |          1 | /       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |          2 | \       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |          3 | #       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |          4 | #       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |          5 | /       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |          6 | L       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |          7 | /       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |          8 | \       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |          9 | #       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         10 | #       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         11 | /       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         12 | \       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         13 | B       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         14 | B       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         15 | /       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         16 | \       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         17 | /       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         18 | \       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         19 | /       |
|      1235 |     1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\   |         20 | \       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |          1 | /       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |          2 | \       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |          3 | #       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |          4 | #       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |          5 | /       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |          6 | L       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |          7 | /       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |          8 | \       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |          9 | /       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         10 | \       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         11 | B       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         12 | B       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         13 | /       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         14 | \       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         15 | /       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         16 | \       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         17 | #       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         18 | #       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         19 | /       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         20 | \       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         21 | /       |
|      1235 |     1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ |         22 | \       |
+-----------+----------+------------+------------------------+------------+---------+

Upvotes: 1

Related Questions