Reputation: 137
I have the following query in SQL Server Management Studio 18, let's call it Query1:
SELECT
stage.IDContratto,
SUM(stageReg.Costo) AS Costo
FROM STAGING.TabContrattiRedditivita AS stage
INNER JOIN STAGING.TabCommesse AS stageCom ON stage.CodiceContratto = stageCom.CodiceContrattoCommessa
INNER JOIN STAGING.TabRegistrazioneOreRisorse AS stageReg
ON stageCom.CodiceCommessa = stageReg.CodiceCommessaCalcolato
AND stageReg.DataRegistrazione BETWEEN stage.StartDate AND stage.EndDate
WHERE stageCom.SeMotivoNonFatturabilePerditaCommessa = 1
GROUP BY stage.IDContratto
TabContrattiRedditivita has 16K rows, TabCommesse has 49K rows, and TabRegistrazioneOreRisorse has 6.8 MLN rows. Query1 returns 1.200 rows. Because of the IX_CostiCommessa non-clustered index I put on TabRegistrazioneOreRisorse (details below) this query completes in about 3 min, which all in all is fine to me. You can see the actual exec plan here.
However I actually use Query1 inside an UPDATE of TabContrattiRedditivita, let's call it Query2:
UPDATE STAGING.TabContrattiRedditivita
SET
ActualCostoCommesseNonFatturanti += costi.Costo,
TotaleCostoCommesseNonFatturanti += costi.Costo
FROM STAGING.TabContrattiRedditivita AS stage
INNER JOIN (Query1) AS costi ON stage.IDContratto = costi.IDContratto
And Query 2 completes in 16 min or more, which is not fine. You can see the actual exec plan here.
You might think it's a problem of writing operations, but in the following I report some strange facts that led me to think it is not.
First, Query1 returns just 1.200 rows, so the writing operations are insignificant (in my ETL I do UPDATEs 2 to 3 orders of magnitude higher without any performance problem). Second, as you can see above the actual exec plan of the subquery Query1 inside Query2 looks identical to the actual exec plan of Query1 executed alone (except for percentages, of course). Third, live statistics about Query2 seems to reveal that the Index Seek on TabRegistrazioneOreRisorse is slowing down Query2, not the UPDATE operation, which instead takes < 1 sec (notice the total running time was 17 min 11 sec):
This is the same Index Seek that in Query1 only took about 3 min (total running time: 3 min 10 sec):
So it seems like the mere presence of the UPDATE is causing Query1 to slow down dramatically even before the UPDATE is executed.
Here come the twist: if I copy my datawarehouse tables TabContrattiRedditivita, TabCommesse and TabRegistrazioneOreRisorse into temp tables #Tab1, #Tab2 and #Tab3 respectively, and then I create the same PKs and indexes on these temp tables, then all suddenly works. Query1:
SELECT
stage.IDContratto,
SUM(stageReg.Costo) AS Costo
FROM #Tab1 AS stage
INNER JOIN #Tab2 AS stageCom ON stage.CodiceContratto = stageCom.CodiceContrattoCommessa
INNER JOIN #Tab3 AS stageReg
ON stageCom.CodiceCommessa = stageReg.CodiceCommessaCalcolato
AND stageReg.DataRegistrazione BETWEEN stage.StartDate AND stage.EndDate
WHERE stageCom.SeMotivoNonFatturabilePerditaCommessa = 1
GROUP BY stage.IDContratto
Execution time about 3 min, just as the previous Query1; actual exec plan here. Query2:
UPDATE #Tab1
SET
ActualCostoCommesseNonFatturanti += costi.Costo,
TotaleCostoCommesseNonFatturanti += costi.Costo
FROM #Tab1 AS stage
INNER JOIN (Query1) AS costi ON stage.IDContratto = costi.IDContratto
Execution time about 3 min 10 sec, instead of 16 or 17 min like the previous Query2; actual exec plan here.
How can this be? Any clue about how to fix this?
Note: I also tried a couple of alternatives, which revealed themselves uneffective.
I tried to use a #temp table: I put Query1 INTO #temp
, then executing Query2 this way:
UPDATE STAGING.TabContrattiRedditivita
SET
ActualCostoCommesseNonFatturanti += costi.Costo,
TotaleCostoCommesseNonFatturanti += costi.Costo
FROM STAGING.TabContrattiRedditivita AS stage
INNER JOIN #temp AS costi ON stage.IDContratto = costi.IDContratto
The results are the same, but this time Query1 is the slow part: Query1 runs in 16 min, with the Index Seek on Tab3 very slow, then Query2 runs in few seconds.
I also tried to use a CTE in two ways. Way number 1:
WITH CostoRegistrazioni AS (Query1)
UPDATE STAGING.TabContrattiRedditivita
SET
ActualCostoCommesseNonFatturanti += costi.Costo,
TotaleCostoCommesseNonFatturanti += costi.Costo
FROM STAGING.TabContrattiRedditivita AS stage
INNER JOIN CostoRegistrazioni AS costi ON stage.IDContratto = costi.IDContratto
Way number 2:
WITH updateStage AS (
SELECT
ActualCostoCommesseNonFatturanti,
TotaleCostoCommesseNonFatturanti,
costi.Costo
FROM STAGING.TabContrattiRedditivita AS stage
INNER JOIN (Query1) AS costi ON stage.IDContratto = costi.IDContratto
)
UPDATE updateStage
SET
ActualCostoCommesseNonFatturanti += Costo,
TotaleCostoCommesseNonFatturanti += Costo
In both cases same result: Query1 runs in 16 min, with the Index Seek on TabRegistrazioneOreRisorse very slow.
Technical details
CREATE NONCLUSTERED INDEX [IX_CostiCommessa]
ON [STAGING].[TabRegistrazioneOreRisorse] (
[DataRegistrazione] ASC,
[CodiceCommessaCalcolato] ASC
)
INCLUDE (
[Costo],
[SeRisorsaInterna],
[SeRisolutivo]
)
WITH (
PAD_INDEX = ON,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
FILLFACTOR = 100
)
SELECT *
INTO #Tab1
FROM STAGING.TabContrattiRedditivita
SELECT *
INTO #Tab2
FROM STAGING.TabCommesse
SELECT *
INTO #Tab3
FROM STAGING.TabRegistrazioneOreRisorse
You may have noticed some missed index notifications in the actual exec plans above. I already tried to create them, the only effect was slowing down the queries (even Query1).
The warning you may have seen in the actual exec plan of Query2 is an ExcessiveGrant, which I'm not sure how to interpret:
Upvotes: 0
Views: 284
Reputation: 71586
Directly updating an updatable CTE should prove to be faster, as you don't need to re-query #Tab1
:
WITH costi AS (
SELECT
stage.ActualCostoCommesseNonFatturanti,
stage.TotaleCostoCommesseNonFatturanti,
SUM(stageReg.Costo) OVER (PARTITION BY stage.IDContratto) AS Costo
FROM #Tab1 AS stage
INNER JOIN #Tab2 AS stageCom ON stage.CodiceContratto = stageCom.CodiceContrattoCommessa
INNER JOIN #Tab3 AS stageReg
ON stageCom.CodiceCommessa = stageReg.CodiceCommessaCalcolato
AND stageReg.DataRegistrazione BETWEEN stage.StartDate AND stage.EndDate
WHERE stageCom.SeMotivoNonFatturabilePerditaCommessa = 1
)
UPDATE costi
SET
ActualCostoCommesseNonFatturanti += costi.Costo,
TotaleCostoCommesseNonFatturanti += costi.Costo;
I would also recommend the following indexes:
stage (IDContratto) INCLUDE (CodiceContratto, StartDate, EndDate, ActualCostoCommesseNonFatturanti, TotaleCostoCommesseNonFatturanti)
stageCom (SeMotivoNonFatturabilePerditaCommessa, CodiceContrattoCommessa) INCLUDE (CodiceCommessa)
stageReg (CodiceCommessaCalcolato, DataRegistrazione) INCLUDE (Costo)
You could alternately make a filtered index on stageCom
stageCom (CodiceContrattoCommessa) INCLUDE (CodiceCommessa, SeMotivoNonFatturabilePerditaCommessa) WHERE (SeMotivoNonFatturabilePerditaCommessa = 1)
Upvotes: 1