Sohaib
Sohaib

Reputation: 4704

Most effective way to use group function in another column

I have a query that looks something like this:

SELECT COUNT(DISTINCT A) as a_distinct,
       COUNT(DISTINCT B) as b_distinct,
       COUNT(DISTINCT A)/COUNT(DISTINCT B) as a_b_ratio
FROM
       sometable_ab

As we can see this looks very inefficient as aggregate functions are run twice even though they have been calculated. I could only think of one solution to the problem that is breaking it into two queries. Is that the only probably solution. Or is their a better more efficient solution that could be done. I am using Redshift DB which mostly uses postgresql but a solution with even MYSQL would be acceptable as I cannot think of a way in any DB to do this efficiently.

Upvotes: 2

Views: 67

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658302

Using a subquery still counts as one query for any RDBMS. More importantly, count() never returns NULL, but 0 if no row is found (or no non-null value for the given expression in any row). This would lead you straight into a division by zero exception. Fix it with NULLIF (also standard SQL). You'll get NULL in this case.

SELECT *, a_distinct / NULLIF(b_distinct, 0) AS a_b_ratio
FROM (
   SELECT count(DISTINCT a) AS a_distinct
        , count(DISTINCT b) AS b_distinct
   FROM   sometable_ab
   ) sub;

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270713

If you are worried about the performance impact, just use a subquery:

SELECT a_distinct, b_distinct, a_distinct / b_distinct as a_b_ratio
FROM (SELECT COUNT(DISTINCT A) as a_distinct,
             COUNT(DISTINCT B) as b_distinct
      FROM sometable_ab
     ) ab

For most aggregation functions, this would be irrelevant, but count(distinct) can be a performance hog.

This is ANSI standard SQL and should work in any database you mention.

Upvotes: 3

Related Questions