Reputation: 14118
Say I have this query:
SELECT CompanyId, COUNT(*) as Total
FROM Customer
GROUP BY CompanyId
HAVING COUNT(*) > 100
I have the COUNT(*)
twice in my query. Does this mean the COUNT
is performed twice?
This is a simple example, but when I have a more complex calculation (something like SUM(Weight) / COUNT(*)
) I'm worried it might impact performance. Or will any performance impact be negligable?
I'm using MS SQL 2012, and can't do HAVING Total > 100
.
Upvotes: 5
Views: 670
Reputation: 28789
If you have a healthy interest in how your queries are processed under the covers, get familiar with execution plans and how to read them. All of what follows was established experimentally using those.
Aggregates are not computed more than once, but the expressions involving them are. Consider:
SELECT CompanyId, SUM(Weight) / COUNT(*)
FROM Customer
GROUP BY CompanyId
HAVING SUM(Weight) / COUNT(*) > 100
SUM(Weight)
and COUNT(*)
will be computed only once, but the division will be performed twice (once when filtering, and once where selecting). This has no measurable impact on performance, of course -- the key is that it minimizes the number of times it has to go through all the data.
This means that even if your HAVING
is completely different from your SELECT
list, the table will still only be scanned once and aggregated once:
SELECT CompanyId, MAX(Weight), MIN(Weight), COUNT(*) as Total
FROM Customer
GROUP BY CompanyId
HAVING MAX(Weight) > 2 * MIN(Weight) AND AVG(Weight) > 0.5
There are four aggregates here: MAX(Weight)
, MIN(Weight)
, AVG(Weight)
and COUNT(*)
.1 The optimizer will compute all of those in one pass, group the whole by CompanyId
, apply the HAVING
filter and then select the desired result.2
Disclaimer: as with all claims about what the optimizer does, all this is subject to change in any release of SQL Server and may vary with trace flags, statistics, indexes, and the specifics of particular queries. The above is true for SQL Server 2012 and 2016, at least, for two particular databases, at least, where indexes play no role.
AVG
isn't actually an aggregate on its own; internally the optimizer expands it to SUM / COUNT(*)
, with a check to prevent division by zero. So the aggregates are actually MAX
, MIN
, SUM
and COUNT
.Upvotes: 8
Reputation: 1270463
In all likelihood, the COUNT(*)
will be done twice, although the decision is really up to the query optimizer.
And, this pretty much makes no difference at all.
The effort expended in aggregating data is in moving data around, not in the aggregation functions, especially really simply ones like COUNT(*)
. Some aggregation functions are more expensive (COUNT(DISTINCT)
comes to mind), but in general, data movement is more expensive than a simple aggregation function.
You don't mention the database, but many allow:
HAVING Total > 100
Or the use of a subquery/CTE usually has no effect on performance.
Upvotes: 1