Reputation: 91
I have been tasked w/ converting a SQL select statement into a SSIS package. However, I'm unsure how I should go about handling the JOINS. Is it best to use the Merge/Merge Join? Should I simply use this select statement in the OLE DB Connection? I want to use the method that is going to provide the best performance.
Here is my SQL:
SELECT
AC.Sys_ID,
AC.Number,
Approval,
AC.[Type],
Category,
AC.[Configuration item],
AC.[Short description],
[Planned start date],
[Actual start],
AC.[Planned end date],
[Actual end],
AC.[Assignment group],
AC.[Assigned to],
AC.Risk,
[State],
Closed,
AC.[Close code],
ICC.Number,
ICC.Opened,
ICC.[Priority],
ICC.[Short description],
ICC.Resolved,
NULL AS DCIO,
EOMONTH(DATEADD(M,-1,AC.Report_Date)) AS Reporting_Month_End_Date,
AC.Report_Date,
AC.Ingestion_Date
FROM
RAW.SERVICENOW_IMPLEMENTED_CHANGES AS AC
LEFT OUTER JOIN (SELECT [Caused by change number], MAX(Opened) AS Max_Opened FROM RAW.SERVICENOW_INCIDENTS_CAUSED_BY_CHANGE GROUP BY [Caused by change number]) AS MICC ON AC.Number = MICC.[Caused by change number]
LEFT OUTER JOIN RAW.SERVICENOW_INCIDENTS_CAUSED_BY_CHANGE AS ICC ON MICC.[Caused by change number] = ICC.[Caused by change number] AND MICC.Max_Opened = ICC.Opened
Upvotes: 0
Views: 393
Reputation: 11
For using data from any other database in SSIS you can simply go with the select statement because SSIS does not have its own syntax to go with. As SSIS allows the developer to choose between two different scripting languages: C# or Visual Basic(VB) etc. See where you can make this choice, drop a script task onto the control flow design surface.
Upvotes: 0
Reputation: 61201
SQL Server, or any database, can have an index which will improve your ability to match data. SSIS won't have an index so every time you want to join data, you're either going to have to sort in the source systems or sort within the package. @Piotr and @ericBrandt have it correct in their comments - push back as the direction you're given is dumb.
That said, for this specific case, you're just enriching an existing row from your "service now implemented change" with the the service now incidents caused by change. It's a 1 to optional 1 here on the join (assuming I'm reading it correctly).
Add a Lookup Component, LKP MICC. Specify that data is sourced from query and specify it should ignore errors.
SELECT [Caused by change number], MAX(Opened) AS Max_Opened
FROM RAW.SERVICENOW_INCIDENTS_CAUSED_BY_CHANGE
GROUP BY [Caused by change number]
Map the input column [Number] to [Caused by change number] and check the Max_Opened column so it's available downstream
Add a second Lookup Component, LKP ICC. Specify that it too is sourced from query and specify it should ignore errors.
SELECT ICC.Number, ICC.Opened, ICC.[Priority], ICC.[Short description], ICC.Resolved
FROM RAW.SERVICENOW_INCIDENTS_CAUSED_BY_CHANGE
Map the input column [Max_Opened] to the lookup column [Opened]. Check the available columns.
You should be able to run the package. When it starts up, it's going to calculate all the groupings for the MICC and ICC and then cache it locally. As the rows flow through, the package will check the cache for matched data.
Upvotes: 2