user3003374
user3003374

Reputation: 73

Improve performance of correlated subquery with inner joins

I have the following subquery:

SELECT
  b.state, 
  b.city,
  count(b.state) as totalCount, 
  sum(cast(replace(b.annual_prod, ',','.') as decimal)) AS annualProd, 
  (
    SELECT count(size_k) 
    FROM opentable_clean a  
    WHERE a.state = b.state
      AND a.city = b.city
      AND cast(replace(a.size_k, ',','.') as decimal) >= 20  
    GROUP BY a.state, a.city
  ) as Above20k  
FROM opentable_clean b
GROUP BY b.state, b.city
ORDER by annualProd DESC;  

This works but the query is very inefficient and takes a long time given the size of the underlying table. I am thinking using an inner join could improve performance but I have not been able to try one that works.

Any suggestions would be helpful as I am new to sql.

Upvotes: 2

Views: 1545

Answers (2)

Dan Field
Dan Field

Reputation: 21641

It's not a join you're looking for, but a condition on the aggregate function.. something like this

select b.state, 
    b.city,  
    count(b.state) as totalCount, 
    sum(cast(replace(b.annual_prod, ',','.') as decimal)) AS annualProd, 
    SUM(CASE 
           WHEN cast(replace(a.size_k, ',','.') as decimal) >= 20 
           THEN 1 
           ELSE 0 END) as Above20k
    FROM opentable_clean b  
    GROUP BY b.state, b.city  
    ORDER by annualProd DESC; 

You will still see some hit on doing all those replaces - if you can create even just a calculated persisted column on the table to store the strings properly, your query would perform better.

The reason this will help: instead of requiring the engine to scan the table twice, it should be able to do this whole thing in one scan, since you're only working with one table anyway. If you were in fact using a second table, you'd want to use the same kind of method with an appropriate JOIN.

Upvotes: 1

Amy B
Amy B

Reputation: 110111

If you want to improve the performance of a query, you should first review the execution plan and the io stats.

To see an execution plan, click the display execution plan button.

To see io stats, run the query after running SET STATISTICS IO ON. The stats will appear with the messages.

If there are no indexes, this query will read open_table b by tableScan/clusteredIndexScan, then it will group and read open_table a by tableScan/clusteredIndexScan for each group.

The simplest index that could help would be one index on (state, city)

Upvotes: 0

Related Questions