Peter
Peter

Reputation: 14118

Does a calculation in a HAVING and in the SELECT mean it will be performed twice?

Say I have this query:

SELECT CompanyId, COUNT(*) as Total
FROM Customer
GROUP BY CompanyId
HAVING COUNT(*) > 100

I have the COUNT(*) twice in my query. Does this mean the COUNT is performed twice?

This is a simple example, but when I have a more complex calculation (something like SUM(Weight) / COUNT(*)) I'm worried it might impact performance. Or will any performance impact be negligable?

I'm using MS SQL 2012, and can't do HAVING Total > 100.

Upvotes: 5

Views: 670

Answers (2)

Jeroen Mostert
Jeroen Mostert

Reputation: 28789

If you have a healthy interest in how your queries are processed under the covers, get familiar with execution plans and how to read them. All of what follows was established experimentally using those.

Aggregates are not computed more than once, but the expressions involving them are. Consider:

SELECT CompanyId, SUM(Weight) / COUNT(*)
FROM Customer
GROUP BY CompanyId
HAVING SUM(Weight) / COUNT(*) > 100

SUM(Weight) and COUNT(*) will be computed only once, but the division will be performed twice (once when filtering, and once where selecting). This has no measurable impact on performance, of course -- the key is that it minimizes the number of times it has to go through all the data.

This means that even if your HAVING is completely different from your SELECT list, the table will still only be scanned once and aggregated once:

SELECT CompanyId, MAX(Weight), MIN(Weight), COUNT(*) as Total
FROM Customer
GROUP BY CompanyId
HAVING MAX(Weight) > 2 * MIN(Weight) AND AVG(Weight) > 0.5

There are four aggregates here: MAX(Weight), MIN(Weight), AVG(Weight) and COUNT(*).1 The optimizer will compute all of those in one pass, group the whole by CompanyId, apply the HAVING filter and then select the desired result.2

Disclaimer: as with all claims about what the optimizer does, all this is subject to change in any release of SQL Server and may vary with trace flags, statistics, indexes, and the specifics of particular queries. The above is true for SQL Server 2012 and 2016, at least, for two particular databases, at least, where indexes play no role.


  1. AVG isn't actually an aggregate on its own; internally the optimizer expands it to SUM / COUNT(*), with a check to prevent division by zero. So the aggregates are actually MAX, MIN, SUM and COUNT.
  2. This is the case for a sequential plan. For a parallel plan, things get a little more complicated with multiple parallel scans that have to be joined together, but it's still true that aggregates aren't computed more than once (where possible).

Upvotes: 8

Gordon Linoff
Gordon Linoff

Reputation: 1270463

In all likelihood, the COUNT(*) will be done twice, although the decision is really up to the query optimizer.

And, this pretty much makes no difference at all.

The effort expended in aggregating data is in moving data around, not in the aggregation functions, especially really simply ones like COUNT(*). Some aggregation functions are more expensive (COUNT(DISTINCT) comes to mind), but in general, data movement is more expensive than a simple aggregation function.

You don't mention the database, but many allow:

HAVING Total > 100

Or the use of a subquery/CTE usually has no effect on performance.

Upvotes: 1

Related Questions