Przemyslaw Remin
Przemyslaw Remin

Reputation: 6960

Optimizing GROUP BY performance

Is there some tricky way to GROUP BY a variable which has been defined by alias or which is a result of calculation? I think that the following code makes a double dip by calculating MyMonth in Select statement and then again in Group statement. It may be unnecessary waste. It is not possible by simple GROUP BY MyMonth. Is it possible to force only one calculation of month([MyDate])?

Update of code. Aggregate function is added.

SELECT month([MyDate]) AS MyMonth, count([MyDate]) AS HowMany
FROM tableA
WHERE [MyDate] BETWEEN '2014-01-01' AND '2014-12-31'
GROUP BY month([MyDate])
ORDER BY MyMonth

Upvotes: 0

Views: 93

Answers (2)

Clockwork-Muse
Clockwork-Muse

Reputation: 13106

Your real problem likely stems from calling MONTH(...) on every row. This prevents the optimizer from using an index to fulfill the count (it can use it for the WHERE clause, but this will still be many rows).

Instead, you should turn this into a range query, that the optimizer could use for comparisons against an index. First we build a simple range table:

WITH Months as (SELECT MONTH(d) AS month, 
                       d AS monthStart, DATEADD(month, 1, d) AS monthEnd
                FROM (VALUES(CAST('20140101' AS DATE))) t(d)
                UNION ALL
                SELECT MONTH(monthEnd), 
                       monthEnd, DATEADD(month, 1, monthEnd)
                FROM Months
                WHERE monthEnd < CAST('20150101' AS DATE))

SQL Fiddle Example
(if you have an existing calendar table, you can base your query on that, but sometimes a simple ad-hoc one works best)

Once we have the range-table, you can then use it to constrain and bucket your data, like so:

SELECT Months.month, COUNT(*)
FROM TableA
JOIN Months
  ON TableA.MyDate >= Months.monthStart
     AND TableA.MyDate < Months.monthEnd
GROUP BY Months.month

Note: The start of the date range was changed to 2014-01-01, as it seems strange that you'd only include one day from January, when aggregating months...

Upvotes: 1

jarlh
jarlh

Reputation: 44805

No, you can't use column alias directly in the GROUP BY clause. Instead do a select in the from list, and use the result column in your group by.

select MyMonth, MAX(someothercolumn)
from
(
SELECT month([MyDate]) AS MyMonth,
       someothercolumn
FROM tableA
WHERE [MyDate] BETWEEN '2014-01-31' AND '2014-12-31'
)
GROUP BY MyMonth
ORDER BY MyMonth

Upvotes: 1

Related Questions