SSIS Best Practices for JOINs

Question

I have been tasked w/ converting a SQL select statement into a SSIS package. However, I'm unsure how I should go about handling the JOINS. Is it best to use the Merge/Merge Join? Should I simply use this select statement in the OLE DB Connection? I want to use the method that is going to provide the best performance.

Here is my SQL:

SELECT
    AC.Sys_ID,
    AC.Number,
    Approval,
    AC.[Type],
    Category,
    AC.[Configuration item],
    AC.[Short description],
    [Planned start date],
    [Actual start],
    AC.[Planned end date],
    [Actual end],
    AC.[Assignment group],
    AC.[Assigned to],
    AC.Risk,
    [State],
    Closed,
    AC.[Close code],
    ICC.Number,
    ICC.Opened,
    ICC.[Priority],
    ICC.[Short description],
    ICC.Resolved,
    NULL AS DCIO,
    EOMONTH(DATEADD(M,-1,AC.Report_Date)) AS Reporting_Month_End_Date,
    AC.Report_Date,
    AC.Ingestion_Date
FROM
    RAW.SERVICENOW_IMPLEMENTED_CHANGES AS AC    
    LEFT OUTER JOIN (SELECT [Caused by change number], MAX(Opened) AS Max_Opened FROM RAW.SERVICENOW_INCIDENTS_CAUSED_BY_CHANGE GROUP BY [Caused by change number]) AS MICC ON AC.Number = MICC.[Caused by change number]
    LEFT OUTER JOIN RAW.SERVICENOW_INCIDENTS_CAUSED_BY_CHANGE AS ICC ON MICC.[Caused by change number] = ICC.[Caused by change number] AND MICC.Max_Opened = ICC.Opened

billinkc · Accepted Answer

SQL Server, or any database, can have an index which will improve your ability to match data. SSIS won't have an index so every time you want to join data, you're either going to have to sort in the source systems or sort within the package. @Piotr and @ericBrandt have it correct in their comments - push back as the direction you're given is dumb.

That said, for this specific case, you're just enriching an existing row from your "service now implemented change" with the the service now incidents caused by change. It's a 1 to optional 1 here on the join (assuming I'm reading it correctly).

Add a Lookup Component, LKP MICC. Specify that data is sourced from query and specify it should ignore errors.

SELECT [Caused by change number], MAX(Opened) AS Max_Opened 
FROM RAW.SERVICENOW_INCIDENTS_CAUSED_BY_CHANGE 
GROUP BY [Caused by change number]

Map the input column [Number] to [Caused by change number] and check the Max_Opened column so it's available downstream

Add a second Lookup Component, LKP ICC. Specify that it too is sourced from query and specify it should ignore errors.

SELECT ICC.Number, ICC.Opened, ICC.[Priority], ICC.[Short description], ICC.Resolved
FROM RAW.SERVICENOW_INCIDENTS_CAUSED_BY_CHANGE

Map the input column [Max_Opened] to the lookup column [Opened]. Check the available columns.

You should be able to run the package. When it starts up, it's going to calculate all the groupings for the MICC and ICC and then cache it locally. As the rows flow through, the package will check the cache for matched data.

SSIS Best Practices for JOINs

Answers (2)

Related Questions