Alternative to references in a GROUP BY column to the results of a correlated subquery

Question

This question comes as a result of a limitation in Amazon Redshift, the columnar analytics database based on Paraccel. One of the unsupported features is references in a GROUP BY column to the results of a correlated subquery.

For example, the following statement will generate a Redshift error because GROUP BY uses list which is generated by a subquery.

select listing.listid,
(select count (sales.listid) from sales where sales.listid=listing.listid) as list
from listing
group by list, listing.listid;

The following example from Gordon Linoff is another unsupported use case (an answer to a specific question that generated this general question).

select type, (case when cnt > XXX then url end) as url, sum(cnt) as visit_cnt
from (select type, url, count(*) as cnt
      from t
      group by type, url
     ) t
group by type, url
order by type, sum(cnt) desc;

The purpose of this question is to find a generic pattern for overcoming this particular Amazon Redshift correlated subquery limitation. What are the alternative SQL patterns for achieving the same outcome as using values from correlated subqueries?

Alternative to references in a GROUP BY column to the results of a correlated subquery

Answers (1)

Related Questions