Reputation: 688
I'm not sure how best to tune this query and/ or indexes to avoid a blunt FORCE ORDER hint.
This main query runs fine, currently returns 0 rows in 0 seconds:
SELECT S1.ID, S.LOAD_DATE, s.Deleted,S1.HUB_FORM_ID
FROM #TMP S
INNER JOIN HUB_FORM H1 ON
H1.Form_ID = S.HUB_FORM_BK
INNER JOIN HUB_ORG H2 ON
H2.Organisation_ID = S.HUB_ORG_BK
INNER JOIN HUB_PERSON H3 ON
H3.person_id = S.HUB_PERSON_BK
INNER JOIN HUB_EVENT H4 ON
H4.job_id = S.HUB_EVENT_BK
INNER JOIN HUB_WORKFLOW_STEP H5 ON
H5.step_id = S.HUB_WORKFLOW_STEP_BK
INNER JOIN LNK_FORM_ENTITY S1 ON
H1.HUB_FORM_ID = S1.HUB_FORM_ID AND H2.HUB_ORG_ID = S1.HUB_ORG_ID AND H3.HUB_PERSON_ID = S1.HUB_PERSON_ID AND H4.HUB_EVENT_ID = S1.HUB_EVENT_ID AND H5.HUB_WORKFLOW_STEP_ID = S1.HUB_WORKFLOW_STEP_ID
INNER JOIN DK_SAT_LNK_FORM_ENTITY S2 ON
S1.ID = S2.Parent_ID
Adding a WHERE clause on S2.LOAD_DATE_TO makes it run and run (killed off after a minute or two).
WHERE S2.LOAD_DATE_TO = '31/12/9999'
I'm not sure why that happens as:
NB - it doesn't always return 0 rows, but it needs to run (and complete in a reasonable time) whether rows are returned or not.
CREATE NONCLUSTERED INDEX [JM_TEST_190221_2] ON [dbo].[DK_SAT_LNK_FORM_ENTITY]
(
[Parent_ID] ASC,
[LOAD_DATE_TO] ASC
)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
The live query plan shows it running through millions of rows in the LNK_ and DK_ tables and subsequently joined tables, whereas in the original plan it shows actual number of rows = 56 (56 executions - expected 1 row) on the LNK_ table and 0 actual rows (56 executions) on the DK_ table.
If I add OPTION (FORCE ORDER)
after the WHERE clause, it runs in 0 seconds again, with a different query plan to the original good one.
Clearly that resolves the issue in the short term, but I'm wary of using such a blunt instrument given that it may not always be the optimal choice as data changes over time.
Edit I have tried updating statistics with FULL SCAN, and rebuilding key indexes but it had no impact.
Query plans below - any tips or explanation gratefully received!
Original good plan (actual plan): no WHERE clause : https://www.brentozar.com/pastetheplan/?id=HyG3SwTZd
Poor plan (from live query plan at point killed off) : https://www.brentozar.com/pastetheplan/?id=rJpBSPpWO
Good plan with FORCE ORDER hint : https://www.brentozar.com/pastetheplan/?id=SJqxUvT-d
Upvotes: 2
Views: 184
Reputation: 71648
Clearly, your issue is that HUB_FORM
is selective enough that it is limiting the rows down to 0 at the very beginning. But the optimizer does not realize that and therefore it is reversing the order of the joins.
To enforce the order without hammering the rest of the query via FORCE ORDER
, we have two options:
#TMP, HUB_FORM
into a temp table or table variable. This can often cause a fair bit of extra IO.This is often best done by putting the join inside a subquery with a SELECT TOP
, but you may need to modify this by adding one or two further joins.
SELECT S1.ID, S.LOAD_DATE, s.Deleted, S1.HUB_FORM_ID
FROM (
SELECT TOP (9223372036854775807) S.*
FROM #TMP S
INNER JOIN HUB_FORM H1 ON
H1.Form_ID = S.HUB_FORM_BK
) S
INNER JOIN HUB_ORG H2 ON
H2.Organisation_ID = S.HUB_ORG_BK
INNER JOIN HUB_PERSON H3 ON
H3.person_id = S.HUB_PERSON_BK
INNER JOIN HUB_EVENT H4 ON
H4.job_id = S.HUB_EVENT_BK
INNER JOIN HUB_WORKFLOW_STEP H5 ON
H5.step_id = S.HUB_WORKFLOW_STEP_BK
INNER JOIN LNK_FORM_ENTITY S1 ON
H1.HUB_FORM_ID = S1.HUB_FORM_ID AND H2.HUB_ORG_ID = S1.HUB_ORG_ID AND H3.HUB_PERSON_ID = S1.HUB_PERSON_ID AND H4.HUB_EVENT_ID = S1.HUB_EVENT_ID AND H5.HUB_WORKFLOW_STEP_ID = S1.HUB_WORKFLOW_STEP_ID
INNER JOIN DK_SAT_LNK_FORM_ENTITY S2 ON
S1.ID = S2.Parent_ID
If that doesn't work, you may be able to persuade it by changing the TOP
to a variable, and adding an OPTIMIZE FOR
hint at the end:
DECLARE @topRows bigint = 9223372036854775807;
SELECT S1.ID, S.LOAD_DATE, s.Deleted, S1.HUB_FORM_ID
FROM (
SELECT TOP (@topRows) S.*
FROM #TMP S
INNER JOIN HUB_FORM H1 ON
H1.Form_ID = S.HUB_FORM_BK
) S
INNER JOIN HUB_ORG H2 ON
.........
OPTION (OPTIMIZE FOR (@topRows = 1));
This causes the optimizer to think it will only get 1 row out of the join, but actually allows more rows if that is the case at a runtime.
Note that none of this changes the essential semantics of the query
Upvotes: 1