Donald Jansen
Donald Jansen

Reputation: 1985

SUM and Where clause with dates

I have two queries both give the same result, but one is faster than the other

Query one

SELECT 
    MP.MilestonePlanningID, SUM(TS.UnitsUsed) TotalUnits
FROM 
    MilestonePlanning MP
INNER JOIN 
    Timesheet TS ON MP.MilestonePlanningID = TS.MilestonePlanningID
INNER JOIN 
    PlanningAction PA ON MP.LastPlanningActionID = PA.ActionID
WHERE 
    PA.ActionNameID = 4 
    AND PA.ActionDateTime >= DATEADD(MONTH,-1,GETDATE())
GROUP BY 
    MP.MilestonePlanningID

Query two

SELECT MP.MilestonePlanningID, SUM(TS.UnitsUsed) TotalUnits
FROM MilestonePlanning MP
    INNER JOIN Timesheet TS ON MP.MilestonePlanningID = TS.MilestonePlanningID
    INNER JOIN PlanningAction PA ON MP.LastPlanningActionID = PA.ActionID
WHERE PA.ActionNameID = 4
  AND CAST(PA.ActionDateTime AS TIMESTAMP) >= CAST(DATEADD(MONTH,-1,GETDATE()) AS TIMESTAMP)
GROUP BY MP.MilestonePlanningID

Query one is the most obvious one to use and it is the one I started BUT it takes about 9 - 15 seconds to execute, however it remains instant when I remove the following

AND PA.ActionDateTime >= DATEADD(MONTH,-1,GETDATE())

Query two is instant with and without the following line

AND CAST(PA.ActionDateTime AS TIMESTAMP) >= CAST(DATEADD(MONTH,-1,GETDATE()) AS TIMESTAMP)

Query two is also not the most obvious way to look at data from last month to today. can anyone explain to me what is happening here, and what I am doing wrong in query one

Execution Plan attachedExecutionPlan

Upvotes: 2

Views: 157

Answers (2)

GarethD
GarethD

Reputation: 69759

Impossible to say for sure without the execution plans, but my guess would be that the statistics are a bit out of date, as can happen with ascending keys, so the cardinality estimation is wrong. When you include this line:

PA.ActionDateTime >= DATEADD(MONTH,-1,GETDATE())

I suspect SQL Server determines very few rows will be returned, so opts for a plan based on this, perhaps a nested loop join somewhere, or a bookmark lookup, where as a wider plan should be generated, using a hash/merge join or clustered index scan respectively.

The reason the second query works instantly with and without the date filter, is because the cast to timestamp, which is basically a cast to binary means SQL Server is unable to use the statistics on ActionDateTime, and is therefore forced into a wider plan because it is unable to determine a reasonable estimate for the number of matching rows.

Edit

Having seen the plans my suspicions are confirmed. The cardinality estimate is out, SQL server assumes incorrectly that there are very few rows in PlanningAction with ActionDateTime in the last month, it therefore determines that the best way to go about gettting the rows is to get all of the matching rows from PlanningAction, then for each of these rows go into the other two tables and get the data. For a few rows this is more efficient than first getting the data from MilestonePlanning and Timesheet then removing most of the data. For a lot of rows this is hugely inefficient.

With no where clause SQL Server realises that this will be a poor choice of plan, so uses hash matches instead of nested loop joins, which is much better suited to handling more rows of data.

The fix is to simply update your statistics, and probably create a maintenance plan to ensure the statistics are updated more frequently. Or to force the MERGE JOIN:

SELECT MP.MilestonePlanningID, SUM(TS.UnitsUsed) TotalUnits
FROM MilestonePlanning MP
    INNER MERGE JOIN Timesheet TS ON MP.MilestonePlanningID = TS.MilestonePlanningID
    INNER MERGE JOIN PlanningAction PA ON MP.LastPlanningActionID = PA.ActionID
WHERE PA.ActionNameID = 4 AND PA.ActionDateTime >= DATEADD(MONTH,-1,GETDATE())
GROUP BY MP.MilestonePlanningID;

You should use this with caution, because someone may come along and only want to query the last hour, where the nested loop join may be more efficient, but only change the DATEADD bit, and you may again have a performance problem.

You could also an index to assist with the queries, but if they are already instant it is probably not worth the extra maintenance cost of having an index.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269633

For this query:

SELECT MP.MilestonePlanningID, SUM(TS.UnitsUsed) TotalUnits
FROM MilestonePlanning MP INNER JOIN
     Timesheet TS
     ON MP.MilestonePlanningID = TS.MilestonePlanningID INNER JOIN
     PlanningAction PA
     ON MP.LastPlanningActionID = PA.ActionID
WHERE PA.ActionNameID = 4 AND PA.ActionDateTime >= DATEADD(MONTH,-1,GETDATE())
GROUP BY MP.MilestonePlanningID;

I would suggest indexes on PlanningAction(ActionNameId, ActionDateTime, ActionId), MilestonePlanning(LastPlanningActionID, MilestonePlanningID) and Timesheet(MilestonePlanningID).

The best guess (without an execution plan) is that SQL Server is making a poor choice of indexes in the first query. Timestamp is not really something you should be casting to. It is used for row versioning. No doubt, this prevents the use of an index on ActionDateTime, which probably helps the query.

Upvotes: 2

Related Questions