Reputation: 4655
I have the following query:
SELECT COUNT(*)
FROM Address adr INNER JOIN
Audit a on adr.UniqueId = a.UniqueId
The query is taking quite long to complete. I feel dumb, but is there any way to optimize it? I want to count all the address entries that have an underlying auditable.
EDIT: all your inputs are much appreciated, here are some more details:
Upvotes: 7
Views: 420
Reputation: 110071
Since you have two sets of data, ordered by the same value.. have you tried a merge join instead of the nested loop join?
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT COUNT(*)
FROM Address adr INNER JOIN
Auditable a on adr.UniqueId = a.UniqueId
OPTION (LOOP JOIN)
SELECT COUNT(*)
FROM Address adr INNER JOIN
Auditable a on adr.UniqueId = a.UniqueId
OPTION (MERGE JOIN)
SELECT COUNT(*)
FROM Address adr INNER JOIN
Auditable a on adr.UniqueId = a.UniqueId
OPTION (HASH JOIN)
Edit:
These explanations are conceptual. SQL Server may be doing more sophisticated operations than my examples show. This conceptual understanding, matched with the measuring of time and logical IO by the SET STATISTICS commands, and examination of query execution plans - form the basis of my query optimizing technique (grown over four years). May it serve you as well as it has me.
Setup
NestedLoop
The nested loop algorithm iterates the parent data set, and then searches the child data set once for each parent, making it cost: m * log(n)
Merge
The merge join algorithm iterates the parent data set once and the child data set once, making it cost: m + n. It relies on the data being ordered. If you ask for a merge join on un-ordered data, you will incur an ordering operation! This brings the cost to (m * log(m)) + (n * log(n)) + m + n. Even that might be better than nested loop in some cases.
Hash
The hash join algorithm iterates the parent data set once and the child data set once, making it cost: m + n. It relies on having a big enough card table to hold the entire contents of the parent data set.
Upvotes: 11
Reputation: 24132
The clause EXISTS is less expensive to run than an INNER JOIN.
select COUNT(adr.UniqueId)
from Addresses adr
where EXISTS (
select 1
from Auditables aud
where aud.UniqueId = adr.UniqueId
)
Does this suits your need?
N.B. Guids are very expensive for the database engine.
Upvotes: 1
Reputation: 3893
The real issue is the Nested Loop join. For each 1.4 Million rows in the Address table you're doing an index Seek into the Auditble table. That means 1.4M root block, branch block, and leaf block reads for a total of 4.2M block reads. The entire index is probably only 5K blocks or so... it should be doing a hash join so it reads both indexes once, and hashes through them.
If you think these tables are large, I'm guessing this is on a small box without a lot of memory. You need to make sure that you have enough memory allocated to fit the entire index into memory to make the hash join efficient.
Upvotes: 2
Reputation: 839
For large tables such as these, you may wish to partition your data to increase query performance. Also, if you haven't already, try running the Tuning Advisor to see if there are any additional indexes that may be advantageous. In addition, have you reorganized your clustered indexes lately -- is it a task that is part of a maintanence package? Many times, this will greatly improve your performance as well.
Upvotes: 0
Reputation: 62093
Missing index on the foreign key, I would say.
1.4 million and 4 million are not large tables, they are small. Say large when you go through 500 million entries, please.
For a real answer we need the execution plan / query plan, so we can see what happens.
And it would be nice to know what "Long" is in your world (given that you think 4 million rows are a lot). This question will not answer in 1 second ever - so what do you expect and what happens?
I bet, though, you have a missing index. SHort of that, I would start pointing at the hardware (because I have seen that, too, as the reason for crap performance).
Upvotes: 0
Reputation: 103589
If you run this query often and it needs to be super fast, create a materialized indexed view of it. There will be a slight overhead on INSERT/UPDATE/DELETEs, but this query will be just about instant. The aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.
Improving Performance with SQL Server 2005 Indexed Views
Upvotes: 6
Reputation: 32575
Is Auditable.UniqueID a foreign key reference to Address.UniqueID, meaning there are no values in Auditable that don't also exist in Address?
If so, this may work and may be faster:
SELECT COUNT(DISTINCT Auditable.UniqueID)
FROM Auditable
Note: This also assumes that UniqueID is unique(/primary key) in the Address table but not unique in the Auditable table
Upvotes: 1
Reputation: 21881
Not sure if it would be faster but you could try the following
SELECT COUNT(adr.UniqueID) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId
It should give you the same count because unqieieid will never be null.
Upvotes: 0