Sim
Sim

Reputation: 13538

Alternative to references in a GROUP BY column to the results of a correlated subquery

This question comes as a result of a limitation in Amazon Redshift, the columnar analytics database based on Paraccel. One of the unsupported features is references in a GROUP BY column to the results of a correlated subquery.

For example, the following statement will generate a Redshift error because GROUP BY uses list which is generated by a subquery.

select listing.listid,
(select count (sales.listid) from sales where sales.listid=listing.listid) as list
from listing
group by list, listing.listid; 

The following example from Gordon Linoff is another unsupported use case (an answer to a specific question that generated this general question).

select type, (case when cnt > XXX then url end) as url, sum(cnt) as visit_cnt
from (select type, url, count(*) as cnt
      from t
      group by type, url
     ) t
group by type, url
order by type, sum(cnt) desc;

The purpose of this question is to find a generic pattern for overcoming this particular Amazon Redshift correlated subquery limitation. What are the alternative SQL patterns for achieving the same outcome as using values from correlated subqueries?

Upvotes: 1

Views: 724

Answers (1)

Joe Harris
Joe Harris

Reputation: 14035

A left join should do the trick, unless I'm missing something.

SELECT listing.listid
      ,COUNT(sales.listid)
FROM      listing
LEFT JOIN sales
       ON sales.listid = listing.listid
GROUP BY listing.listid
ORDER BY COUNT(sales.listid) DESC
; 

Upvotes: 1

Related Questions