Reputation: 73
I have the following subquery:
SELECT
b.state,
b.city,
count(b.state) as totalCount,
sum(cast(replace(b.annual_prod, ',','.') as decimal)) AS annualProd,
(
SELECT count(size_k)
FROM opentable_clean a
WHERE a.state = b.state
AND a.city = b.city
AND cast(replace(a.size_k, ',','.') as decimal) >= 20
GROUP BY a.state, a.city
) as Above20k
FROM opentable_clean b
GROUP BY b.state, b.city
ORDER by annualProd DESC;
This works but the query is very inefficient and takes a long time given the size of the underlying table. I am thinking using an inner join could improve performance but I have not been able to try one that works.
Any suggestions would be helpful as I am new to sql.
Upvotes: 2
Views: 1545
Reputation: 21641
It's not a join you're looking for, but a condition on the aggregate function.. something like this
select b.state,
b.city,
count(b.state) as totalCount,
sum(cast(replace(b.annual_prod, ',','.') as decimal)) AS annualProd,
SUM(CASE
WHEN cast(replace(a.size_k, ',','.') as decimal) >= 20
THEN 1
ELSE 0 END) as Above20k
FROM opentable_clean b
GROUP BY b.state, b.city
ORDER by annualProd DESC;
You will still see some hit on doing all those replaces - if you can create even just a calculated persisted column on the table to store the strings properly, your query would perform better.
The reason this will help: instead of requiring the engine to scan the table twice, it should be able to do this whole thing in one scan, since you're only working with one table anyway. If you were in fact using a second table, you'd want to use the same kind of method with an appropriate JOIN
.
Upvotes: 1
Reputation: 110111
If you want to improve the performance of a query, you should first review the execution plan and the io stats.
To see an execution plan, click the display execution plan button.
To see io stats, run the query after running SET STATISTICS IO ON
. The stats will appear with the messages.
If there are no indexes, this query will read open_table b by tableScan/clusteredIndexScan, then it will group and read open_table a by tableScan/clusteredIndexScan for each group.
The simplest index that could help would be one index on (state, city)
Upvotes: 0