Reputation: 189
Below is a very small example of the flat table I need to split. The first table is a Lesson table which has the ID, Name and Duration. The second table is a student table which only has the Student Name as a PK. And the third table will be a Many to Many of Lesson ID and Student Name.
Lesson Id | Lesson Name | Lesson Duration | Student1 | Student2 | Student3 | Student4 |
---|---|---|---|---|---|---|
1 | Maths | 1 Hour | Jean | Paul | Jane | Doe |
2 | English | 1 Hour | Jean | Jane | Doe |
Thanks for the assistance.
Upvotes: 0
Views: 297
Reputation: 61211
Were it me, I would design this as 3 data flows.
Since we're assuming the name is what makes a student unique, we need to build a big list of the unique names.
SELECT D.*
FROM
(
SELECT S.Student1 AS StudentName
FROM dbo.MyTable AS S
UNION
SELECT S.Student2 AS StudentName
FROM dbo.MyTable AS S
UNION
SELECT S.Student3 AS StudentName
FROM dbo.MyTable AS S
UNION
SELECT S.Student4 AS StudentName
FROM dbo.MyTable AS S
)D
WHERE D.StudentName IS NOT NULL
ORDER BY D.StudentName;
The use of UNION
in the query will handle deduplication of data and we wrap that in a derived table to filter the NULLs.
I add an explicit order by not that it's needed but since I'm assuming you're using the name as the primary key, let's avoid sort operation when we land the data.
Add an OLE DB Source to your data flow and instead of picking a table in the drop down, you'll use the above query.
Add an OLE DB Destination to the same data flow and connect the two. Assuming your target table looks something like
CREATE TABLE dbo.Student
(
StudentName varchar(50) NOT NULL CONSTRAINT PK__dbo__Student PRIMARY KEY(StudentName)
);
Dealers choice here, you can either write the query or just point at the source table.
A very good practice to get into with SSIS is to only bring the data you need into the buffers so I would write a query like
SELECT DISTINCT S.[Lesson Id], S.[Lesson Name], S.[Lesson Duration]
FROM dbo.MyTable AS S;
I favor a distinct here as I don't know enough about your data but if it were extended and a second Maths class was offered to accommodate another 4 students, it might be Lesson Id 1 again. Or it might be 3 as it indicates course time or something else.
Add an OLE DB Destination and land the data.
There's a few different ways to handle this. I'd favor the lazy way and repeat our approach from the first data flow
SELECT D.*
FROM
(
SELECT S.Student1 AS StudentName, S.[Lesson Id]
FROM dbo.MyTable AS S
UNION
SELECT S.Student2 AS StudentName, S.[Lesson Id]
FROM dbo.MyTable AS S
UNION
SELECT S.Student3 AS StudentName, S.[Lesson Id]
FROM dbo.MyTable AS S
UNION
SELECT S.Student4 AS StudentName, S.[Lesson Id]
FROM dbo.MyTable AS S
)D
WHERE D.StudentName IS NOT NULL
ORDER BY D.StudentName;
And then land in your bridge table with an OLE DB Destination and be done with it.
Do keep with the 3 data flow approach. Trying to do too much in one go is a recipe for trouble.
The operation of moving wide data to narrow data is an Unpivot operation. You'd use that in the Student and bridge table data flows but honestly, I think I've used that component less than 10 times in my career and/or answering SSIS questions here and I do a lot of that.
If the Unpivot operation generates a NULL, then yes, you'd likely want to use a Conditional Split to filter those rows out.
If your reference tables were more complex, then you'd likely be adding a Lookup component to your bridge table population step to retrieve the surrogate key.
Upvotes: 1