Does a calculation in a HAVING and in the SELECT mean it will be performed twice?

Question

Say I have this query:

SELECT CompanyId, COUNT(*) as Total
FROM Customer
GROUP BY CompanyId
HAVING COUNT(*) > 100

I have the COUNT(*) twice in my query. Does this mean the COUNT is performed twice?

This is a simple example, but when I have a more complex calculation (something like SUM(Weight) / COUNT(*)) I'm worried it might impact performance. Or will any performance impact be negligable?

I'm using MS SQL 2012, and can't do HAVING Total > 100.

Jeroen Mostert · Accepted Answer

If you have a healthy interest in how your queries are processed under the covers, get familiar with execution plans and how to read them. All of what follows was established experimentally using those.

Aggregates are not computed more than once, but the expressions involving them are. Consider:

SELECT CompanyId, SUM(Weight) / COUNT(*)
FROM Customer
GROUP BY CompanyId
HAVING SUM(Weight) / COUNT(*) > 100

SUM(Weight) and COUNT(*) will be computed only once, but the division will be performed twice (once when filtering, and once where selecting). This has no measurable impact on performance, of course -- the key is that it minimizes the number of times it has to go through all the data.

This means that even if your HAVING is completely different from your SELECT list, the table will still only be scanned once and aggregated once:

SELECT CompanyId, MAX(Weight), MIN(Weight), COUNT(*) as Total
FROM Customer
GROUP BY CompanyId
HAVING MAX(Weight) > 2 * MIN(Weight) AND AVG(Weight) > 0.5

There are four aggregates here: MAX(Weight), MIN(Weight), AVG(Weight) and COUNT(*).¹ The optimizer will compute all of those in one pass, group the whole by CompanyId, apply the HAVING filter and then select the desired result.²

Disclaimer: as with all claims about what the optimizer does, all this is subject to change in any release of SQL Server and may vary with trace flags, statistics, indexes, and the specifics of particular queries. The above is true for SQL Server 2012 and 2016, at least, for two particular databases, at least, where indexes play no role.

AVG isn't actually an aggregate on its own; internally the optimizer expands it to SUM / COUNT(*), with a check to prevent division by zero. So the aggregates are actually MAX, MIN, SUM and COUNT.
This is the case for a sequential plan. For a parallel plan, things get a little more complicated with multiple parallel scans that have to be joined together, but it's still true that aggregates aren't computed more than once (where possible).

Does a calculation in a HAVING and in the SELECT mean it will be performed twice?

Answers (2)

Related Questions