Reputation: 7240
Note: Although my current focus is tsql, this could well be a more general sql question.
Consider the following valid-but-also-pseudocode sample query:
select
desiredKeyCols,
case count(distinct productID)
when 0 then '0'
when 1 then '1'
else '2+'
end as [Product Count Classification]
from orders
group by desiredKeyCols
The field [Product Count Classification] will return, for each desiredKeyCols
, 0 if there are no associated productID
s, 1 for 1, and 2+ for any higher number. However, count(distinct productID)
will not stop its calculations once it has reached 2. It will happily continue to infinity, and then another operation will calculate the case.
I have seen the same thing arise multiple times.
Is there a more efficient way to implement this? If we want only 0/1+ class the answer is semi join (in/exists). But what about an arbitrary number of ranges?
Upvotes: 1
Views: 252
Reputation: 1269543
There probably is not much you can do. But here are two alternative ways to express the query that might have better performance.
If you have an index on "(desiredKeycols, productid)" and possibly "(desiredKeycols, productid desc)", you might try:
select desiredKeycols,
(case distinct_cnt . . . )
from (select o.*,
(dense_rank() over (partition by desiredKeycols order by productid) +
dense_rank() over (partition by desiredKeycols order by productid desc)
) as distinct_cnt
from orders o
) o
group by desiredKeycols;
This does not stop at "3", but it is possible that it will optimize better than count(distinct)
.
Actually, a slight alternative would use only one index:
select desiredKeycols,
(case cnt . . . )
from (select o.desiredKeycols, count(*) as cnt
from orders o
group by desiredKeycols, productid
) o
group by desiredKeycols;
In some databases this is significantly faster than count(distinct)
. However, I think SQL Server has a better optimizer, so it might not be a big win.
Upvotes: 1