Improve performance of correlated subquery with inner joins

Question

I have the following subquery:

SELECT
  b.state, 
  b.city,
  count(b.state) as totalCount, 
  sum(cast(replace(b.annual_prod, ',','.') as decimal)) AS annualProd, 
  (
    SELECT count(size_k) 
    FROM opentable_clean a  
    WHERE a.state = b.state
      AND a.city = b.city
      AND cast(replace(a.size_k, ',','.') as decimal) >= 20  
    GROUP BY a.state, a.city
  ) as Above20k  
FROM opentable_clean b
GROUP BY b.state, b.city
ORDER by annualProd DESC;

This works but the query is very inefficient and takes a long time given the size of the underlying table. I am thinking using an inner join could improve performance but I have not been able to try one that works.

Any suggestions would be helpful as I am new to sql.

Dan Field · Accepted Answer

It's not a join you're looking for, but a condition on the aggregate function.. something like this

select b.state, 
    b.city,  
    count(b.state) as totalCount, 
    sum(cast(replace(b.annual_prod, ',','.') as decimal)) AS annualProd, 
    SUM(CASE 
           WHEN cast(replace(a.size_k, ',','.') as decimal) >= 20 
           THEN 1 
           ELSE 0 END) as Above20k
    FROM opentable_clean b  
    GROUP BY b.state, b.city  
    ORDER by annualProd DESC;

You will still see some hit on doing all those replaces - if you can create even just a calculated persisted column on the table to store the strings properly, your query would perform better.

The reason this will help: instead of requiring the engine to scan the table twice, it should be able to do this whole thing in one scan, since you're only working with one table anyway. If you were in fact using a second table, you'd want to use the same kind of method with an appropriate JOIN.

Improve performance of correlated subquery with inner joins

Answers (2)

Related Questions