Marcus D
Marcus D

Reputation: 57

How to best join 10 plus tables on the same main table

I need to join many tables on one main table, from which I need all the rows.

Should I do left join for all of them in the same query or should I use temp tables, physical intermediary tables or window functions?

Currently the query will take very long time to complete for some 85K rows with indexes (the ones the query engine suggested) and optimized field data types.

I choose to pursue temp tables. By using five of them I was able to drastically decrease query times by distributing the queries among the temp tables. Also I added a temp table to put in all the subqueries, drastically reducing query times on each subquery.

Here is the original query:

CREATE VIEW [dwh].[Facts Tickets LVL 2 V]
AS
SELECT
/* Level 1 fields */
T.[Ticket ID]
,T.[Brand ID]
,T.[Category ID]
,T.[Channel ID]
,T.[Custom field ID]
,T.[Brand Name]
,T.[Company Group Name]
,T.[Ticket creator User ID]
,T.[Created (datetime)]
,T.[Ticket URL]
,T.[Shared URL]
,T.[Ticket type]
,T.[Status group]
,T.[Importance]
,T.[Allow channelback]
,T.[Has incidents]
,T.[Is Hidden]
,T.[Has draft-reply]
,T.[Has staff answer]
,T.[Is Assigned]
,T.[Is Assigned to bot]
,T.[Is Deleted]
,T.[Is Expired]
,T.[Is Locked]
,T.[Is Spam]
,T.[Has Attachments]
,T.[Has Satisfaction entry]
,T.[Ticket age (days)]
,T.[Ticket age (group)]
,T.[Replies (count)]
,T.[Comments (count)]
,T.[Rows (count)]
,T.[Datasync ID]
,T.[DWH Processing (datetime)]
,T.[DWH Status]

/* Level 2 fields */

,RAC.[Replies by Agent (count)] -- +2 sec
,RATC.[Replies by Ticket Creator (count)] -- +2 sec
,FCR.[First Customer-reply (datetime)] -- +2 sec
,LCR.[Last Customer-reply (datetime)] -- +1 sec
,LCR.[Due (datetime)] -- +0 sec
,FAR.[First Agent-reply (datetime)] -- +1 sec
,LAR.[Agent User ID] -- ++++
,LAR.[Last Agent-reply (datetime)] -- ++++
,LAR.[Updated (datetime)] -- ++++
,TS.[Satisfaction, scored] -- ++++
,TCWT.[Ticket creator wait time (minutes)]
,AWT.[Agent wait time (minutes)]
,ARS.[Agent total wait time (minutes)]
,ARS.[Ticket creator total wait time (minutes)]
FROM [dwh].[Facts Tickets LVL 1 T] AS T --85K in 3 sec

/* ################## Ticket satisfaction queries ################## */
/*  */
LEFT JOIN ( SELECT [Ticket ID], [Satisfaction, scored] -- 317 0 sec, 317 1      sec
        FROM [dwh].[Facts Ticket Satisfactions LVL 1 V]
        WHERE [Ticket Satisfaction ID] IN (
            SELECT MAX([Ticket Satisfaction ID])
            FROM [dwh].[Facts Ticket Satisfactions LVL 1 V]
            GROUP BY [Ticket ID] ) ) as TS
        ON T.[Ticket ID] = TS.[Ticket ID]


/* ################## Response-statistics queries ################## */
/* Ticket creator wait time */
LEFT JOIN ( SELECT [Ticket ID], [Agent reply-time (seconds)] / 60 AS [Ticket        creator wait time (minutes)] -- 445K in 2 sec, 445K in 3 sec,  418K in 5 sec
        FROM [dwh].[Facts Response-statistics LVL 1 T]
        WHERE [Response-statistics ID] IN (
            SELECT MAX([Response-statistics ID]) -- 445K in 2 sec
            FROM [dwh].[Facts Response-statistics LVL 1 T]
            GROUP BY [Ticket ID] ) ) AS TCWT
        ON T.[Ticket ID] = TCWT.[Ticket ID]

/* Agent wait time */
LEFT JOIN ( SELECT [Ticket ID], [Agent wait-time (seconds)] / 60 AS [Agent wait time (minutes)] -- 445K in 1 sec, 418K in 5 sec
        FROM [dwh].[Facts Response-statistics LVL 1 T]
        WHERE [Response-statistics ID] IN (
            SELECT MIN([Response-statistics ID]) -- Flag: takes agent first wait, not last wait time
            FROM [dwh].[Facts Response-statistics LVL 1 T]
            GROUP BY [Ticket ID] ) ) AS AWT
        ON T.[Ticket ID] = AWT.[Ticket ID]

/* Accumulated stats */
LEFT JOIN ( SELECT [Ticket ID], SUM([Agent reply-time (seconds)]) / 60 AS   [Ticket creator total wait time (minutes)], SUM([Agent wait-time (seconds)]) / 60 AS [Agent total wait time (minutes)]  --445K in 4 sec
        FROM [dwh].[Facts Response-statistics LVL 1 T]
        GROUP BY [Ticket ID]) AS ARS
        ON T.[Ticket ID] = ARS.[Ticket ID]




/* ################## Reply queries ################## */
-- 85K in 20 sec
/* [Replies by Agent (count)]. 547K in 11 sec */
LEFT JOIN ( SELECT [Ticket ID], COUNT([Reply ID]) AS [Replies by Agent (count)] -- 575K in 3 sec
        FROM [dwh].[Facts Replies LVL 1 T]
        WHERE [By Agent (Yes/No)] = 'Yes'
        GROUP BY [Ticket ID] ) AS RAC
        ON T.[Ticket ID] = RAC.[Ticket ID]

/* [Replies by Ticket Creator (count)]. 377K in 33 sec */
LEFT JOIN ( SELECT [Ticket ID], COUNT([Reply ID]) AS [Replies by Ticket Creator (count)] -- 398K in 3 sec
        FROM [dwh].[Facts Replies LVL 1 T]
        WHERE [By Agent (Yes/No)] = 'No'
        GROUP BY [Ticket ID] ) AS RATC
        ON T.[Ticket ID] = RATC.[Ticket ID]

/* First Customer Reply */
LEFT JOIN ( SELECT [Ticket ID], [Creation (datetime)] AS [First Customer-reply (datetime)] -- 398K in 5 sec, 398K in 8 sec
        FROM [dwh].[Facts Replies LVL 1 T]
        WHERE [Reply ID] IN (
            SELECT MIN([Reply ID]) -- 398K in 4 sec
            FROM [dwh].[Facts Replies LVL 1 T]
            WHERE [By Agent (1/0)] = 0
            GROUP BY [Ticket ID] ) ) AS FCR
        ON T.[Ticket ID] = FCR.[Ticket ID]

/* Last Customer Reply. 376K in 26 sec*/ --<<-- Bottleneck
LEFT JOIN ( SELECT [Ticket ID], [Creation (datetime)] AS [Last Customer-reply (datetime)], [Due (datetime)] -- 398K in 5 sec, 8 sec
        FROM [dwh].[Facts Replies LVL 1 T]
        WHERE [Reply ID] IN (
            SELECT MAX([Reply ID]) -- 398 in 4 sec
            FROM [dwh].[Facts Replies LVL 1 T]
            WHERE [By Agent (1/0)] = 0
            GROUP BY [Ticket ID] ) ) AS LCR
        ON T.[Ticket ID] = LCR.[Ticket ID]

/* First Agent Reply */ 
LEFT JOIN ( SELECT [Ticket ID], [Creation (datetime)] AS [First Agent-reply (datetime)] -- 6 sec, 9 sec
        FROM [dwh].[Facts Replies LVL 1 T]
        WHERE [Reply ID] IN (
            SELECT MIN([Reply ID]) -- 575K in 4 sec, 550K in 12 sec
            FROM [dwh].[Facts Replies LVL 1 T]
            WHERE [By Agent (1/0)] = 1
            GROUP BY [Ticket ID] ) ) AS FAR
        ON T.[Ticket ID] = FAR.[Ticket ID]

/* Last Agent Reply */ 
LEFT JOIN ( SELECT [Ticket ID], [Reply User-ID] AS [Agent User ID], [Creation (datetime)] AS [Updated (datetime)], [Creation (datetime)] AS [Last Agent-reply (datetime)] -- 573K in 9 sec, 9 sec
        FROM [dwh].[Facts Replies LVL 1 T]
        WHERE [Reply ID] IN (
            SELECT MAX([Reply ID]) -- 575K in 4 sec
            FROM [dwh].[Facts Replies LVL 1 T]
            WHERE [By Agent (1/0)] = 1
            GROUP BY [Ticket ID] ) ) AS LAR
        ON T.[Ticket ID] = LAR.[Ticket ID]


/* ################## Action queries ################## */
/* First 'Assigned the ticket to' Action. 12K in 0 sec */
LEFT JOIN ( SELECT [Ticket ID], MIN([Creation (datetime)]) AS [Initially assigned (datetime)] -- 5K in 1 sec
        FROM [dwh].[Facts Actions LVL 1 T]
        WHERE [Action Type ID] = 28 /* Assigned the ticket to */
        GROUP BY [Ticket ID]) AS FAA
        ON T.[Ticket ID] = FAA.[Ticket ID]

/* Last 'Assigned the ticket to' Action. 12K in 0 sec*/
LEFT JOIN ( SELECT [Ticket ID], MAX([Creation (datetime)]) as [Assigned (datetime)] -- 5K in 1 sec ----------- Artur
        FROM [dwh].[Facts Actions LVL 1 T]
        WHERE [Action Type ID] = 28 /* Assigned the ticket to */
        GROUP BY [Ticket ID]) AS LAA 
        ON T.[Ticket ID] = LAA.[Ticket ID]

/* First 'Completed this ticket' Action. 504K in 8 sec. */
LEFT JOIN ( SELECT [Ticket ID], MIN([Creation (datetime)]) AS [First Completion time (datetime)] -- 534K in 4 sec
        FROM [dwh].[Facts Actions LVL 1 T]
        WHERE [Action Type ID] = 12 and [Action Type Value] = 'Completed' /* Completed this ticket */
        GROUP BY [Ticket ID]) AS FCT
        ON T.[Ticket ID] = FCT.[Ticket ID]

/* Last 'Completed this ticket' Action. 504K in 8 sec. */
LEFT JOIN ( SELECT [Ticket ID], MAX([Creation (datetime)]) AS [Status updated (datetime)], MAX([Creation (datetime)]) AS [Solved (datetime)], MAX([Creation (datetime)]) AS [Completion time (datetime)] -- 534K in 8 sec
        FROM [dwh].[Facts Actions LVL 1 T]
        WHERE [Action Type ID] = 12 and [Action Type Value] = 'Completed' /* Completed this ticket */
        GROUP BY [Ticket ID]) AS LCT
        ON T.[Ticket ID] = LCT.[Ticket ID]

/* [Agent touches (count)]. 558K in 6 sec. */
LEFT JOIN ( SELECT [Ticket ID], COALESCE(COUNT(*),0) AS [Agent touches (count)]  -- 616K in 5 sec
        FROM [dwh].[Facts Actions LVL 1 T]
        WHERE [Action User is Agent] = 'Yes'
        GROUP BY [Ticket ID]) AS ATC
        ON T.[Ticket ID] = ATC.[Ticket ID]

/* Reopens (count) 504K in 7 sec. */
LEFT JOIN ( SELECT [Ticket ID], COUNT(*) -1 AS [Reopens (count)] -- 7K in 1 sec
        FROM [dwh].[Facts Actions LVL 1 T]
        WHERE [Action Type ID] = 12 and [Action Type Value] = 'Open'
        GROUP BY [Ticket ID]
        HAVING count([Ticket ID]) > 1 ) AS AL2
        ON T.[Ticket ID] = AL2.[Ticket ID]

WHERE
    YEAR([Created (datetime)]) = 2018

Upvotes: 0

Views: 737

Answers (3)

TheGameiswar
TheGameiswar

Reputation: 28930

How to best join 10 plus tables on the same main table

I could see your derived tables are little bit complicated..SQLserver uses statistics to choose a plan and when you join multiple tables like the way you did, estimates might be off..

So i suggest,use temp table,index it and then run your query

Example:

SELECT [Ticket ID], [Creation (datetime)] AS [First Customer-reply (datetime)] -- 398K in 5 sec, 398K in 8 sec
        FROM [dwh].[Facts Replies LVL 1 T]
        WHERE [Reply ID] IN (
            SELECT MIN([Reply ID]) -- 398K in 4 sec
            FROM [dwh].[Facts Replies LVL 1 T]
            WHERE [By Agent (1/0)] = 0
            GROUP BY [Ticket ID] )

The above query output should be inserted in a temp table, should be indexed with joining key as lead column.

also, i could see most of the queries are of below form..

SELECT [Ticket ID], [Satisfaction, scored] -- 317 0 sec, 317 1      sec
        FROM [dwh].[Facts Ticket Satisfactions LVL 1 V]
        WHERE [Ticket Satisfaction ID] IN (
            SELECT MAX([Ticket Satisfaction ID])
            FROM [dwh].[Facts Ticket Satisfactions LVL 1 V]
            GROUP BY [Ticket ID] ) 

You don't need to access table twice, you can use window functions ..Example query ,for the query above

;with cte
as
(
select row_number() over (partition by ticketid order by [Ticket Satisfaction ID]) as rn
from
table
)
select * from cte where rn=1

you can remove * from my sample query and index it like below for the query to perform good

create index NCI_tcktid_trnsfrmid on table(ticketid,[Satisfaction ID])
include(somecolumns you need)

Upvotes: 1

Marcus D
Marcus D

Reputation: 57

Using a stored procedure which uses spreads the calculations among different levels of temp tables with 4-6 joins, with subqueries using a temp table shared among all the calculations, drastically reduced the query from seemingly indefinate calculation time to a calculation time of about 10-15 seconds.

Upvotes: 0

paparazzo
paparazzo

Reputation: 45096

This is just silly to me. You only report if it is the MAX([Ticket Satisfaction ID]) of all the tickets. Why not report the max of that ticket? Less work and more information. Use the same pattern for all the joins.

LEFT JOIN ( SELECT [Ticket ID], max([Satisfaction, scored]) as max 
              FROM [dwh].[Facts Ticket Satisfactions LVL 1 V]
             GROUP BY [Ticket ID] 
          ) as TS
       ON T.[Ticket ID] = TS.[Ticket ID]

If you only want the max of all then

LEFT JOIN ( SELECT [Ticket ID], [Satisfaction, scored]) 
                 , DENSE_RANK() over (partition by [Ticket ID] 
                                      order by [Satisfaction, scored] desc) as dr  
              FROM [dwh].[Facts Ticket Satisfactions LVL 1 V] 
          ) as TS
       ON T.[Ticket ID] = TS.[Ticket ID]
      AND TS.dr = 1

Upvotes: 1

Related Questions