Reputation: 13
Looking for some advice really. I've got a dataset for school attendance data that comes into me as: student_id, school_id, term_start_date, attendance_marks.
An example line of this data would be:
1234, 1002, 2016-09-01, '/\##/L/\##BB/\/\/\/\/\'
The attendance marks string is basically two sessions per day, each mark corresponds to various different codes. Basically, I get a start date and then have to work out each day's attendance mark going forward from that... horrible I know, but that's how I get the data I'm afraid.
I've written a script object to loop through this string and output a row for each day to get loaded into a data warehouse.
I should say before posting the code that this script works, it does what I need it to do... but it is painfully slow. I'm hoping I can draw upon the collective experience here to see if there is a more efficient way of achieving this?
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
int studentID = Row.STUDDENTID;
int baseID = Row.SCHOOLID;
DateTime dateKey = Row.STARTDATE;
string marks = Row.MARKS.ToString();
// TODO
// This is a bodge at the moment to make sure the mark string is always divisible by 2
// in the production release I'll have to handle this as an exception
if (marks.Length % 2 != 0)
{
marks = marks + '@';
}
char[] c = marks.ToCharArray();
for (int i = 0; i < c.Length; i += 2)
{
Output0Buffer.AddRow();
Output0Buffer.baseID = baseID;
Output0Buffer.studentID = studentID;
Output0Buffer.dateKey = dateKey.AddDays(i / 2).Year * 10000 + dateKey.AddDays(i / 2).Month * 100 + dateKey.AddDays(i / 2).Day;
Output0Buffer.markAM = c[i].ToString();
Output0Buffer.markPM = c[i + 1].ToString();
if (i == c.Length - 1)
{
base.FinishOutputs();
}
}
}
Is there a better way of doing this? Have I massively overthought it or do I just have to live with it taking ages to run? Thanks in advance.
Upvotes: 1
Views: 424
Reputation: 12243
This could probably be written in Oracle Pl-SQL, though I neither work with Oracle nor recommend you doing transformations within your live system as part of the extract
If you load your data as is into a SQL Server staging environment, you can then return it in your OLEDB Source
component using the script that utilises a derived tally table to split out the Attendance
string into one character per row. This will work with multiple students, schools and days (ie: it is a proper set based solution), so can be used as a one pass fetch that will return your entire dataset:
-- Create test data
declare @d table(StudentID int,SchoolID int, AttDate date, Attendance nvarchar(500));
insert into @d values
(1234, 1002, '20160901', '/\##/L/\##BB/\/\/\/\')
,(1235, 1002, '20160901', '/\##/L/\##/\BB/\/\/\')
,(1234, 1002, '20160902', '/\##/L/\##BB/\/\/\/\/\')
,(1235, 1002, '20160902', '/\##/L/\/\BB/\/\##/\/\');
-- Create a list of 10 rows
with n(n) as (select n from (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n(n))
-- Cross join 6 times to reach 1,000,000 numbers if required. Filtered to longest Attendance string in the SELECT TOP()
,t(t) as (select top (select max(len(Attendance)) from @d) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4,n n5,n n6)
select d.StudentID
,d.SchoolID
,d.AttDate
,d.Attendance
,t.t as AttCharNum
,substring(d.Attendance,t.t,1) as AttChar -- Use SUBSTRING to retreive just the one character
from @d d
join t -- Join to numbers only where they are actually required
on t.t <= len(d.Attendance)
order by d.StudentID
,d.AttDate
,t.t;
Output:
+-----------+----------+------------+------------------------+------------+---------+
| StudentID | SchoolID | AttDate | Attendance | AttCharNum | AttChar |
+-----------+----------+------------+------------------------+------------+---------+
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 1 | / |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 2 | \ |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 3 | # |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 4 | # |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 5 | / |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 6 | L |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 7 | / |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 8 | \ |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 9 | # |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 10 | # |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 11 | B |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 12 | B |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 13 | / |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 14 | \ |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 15 | / |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 16 | \ |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 17 | / |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 18 | \ |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 19 | / |
| 1234 | 1002 | 2016-09-01 | /\##/L/\##BB/\/\/\/\ | 20 | \ |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 1 | / |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 2 | \ |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 3 | # |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 4 | # |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 5 | / |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 6 | L |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 7 | / |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 8 | \ |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 9 | # |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 10 | # |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 11 | B |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 12 | B |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 13 | / |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 14 | \ |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 15 | / |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 16 | \ |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 17 | / |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 18 | \ |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 19 | / |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 20 | \ |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 21 | / |
| 1234 | 1002 | 2016-09-02 | /\##/L/\##BB/\/\/\/\/\ | 22 | \ |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 1 | / |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 2 | \ |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 3 | # |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 4 | # |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 5 | / |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 6 | L |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 7 | / |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 8 | \ |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 9 | # |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 10 | # |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 11 | / |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 12 | \ |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 13 | B |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 14 | B |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 15 | / |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 16 | \ |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 17 | / |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 18 | \ |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 19 | / |
| 1235 | 1002 | 2016-09-01 | /\##/L/\##/\BB/\/\/\ | 20 | \ |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 1 | / |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 2 | \ |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 3 | # |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 4 | # |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 5 | / |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 6 | L |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 7 | / |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 8 | \ |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 9 | / |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 10 | \ |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 11 | B |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 12 | B |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 13 | / |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 14 | \ |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 15 | / |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 16 | \ |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 17 | # |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 18 | # |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 19 | / |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 20 | \ |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 21 | / |
| 1235 | 1002 | 2016-09-02 | /\##/L/\/\BB/\/\##/\/\ | 22 | \ |
+-----------+----------+------------+------------------------+------------+---------+
Upvotes: 1