Reputation: 3471
I've tried to illustrate the problem in the (made-up) example below. Essentially, I want to filter records in the primary table based on content in a secondary table. When I attempted this using subqueries, our application performance took a big hit (some queries nearly 10x slower).
In this example I want to return all case notes for a customer EXCEPT for the ones that have references to products 1111 and 2222 in the detail table:
select cn.id, cn.summary from case_notes cn
where customer_id = 2
and exists (
select 1 from case_note_details cnd
where cnd.case_note_id = cn.id
and cnd.product_id not in (1111,2222)
)
I tried using a join as well:
select distinct cn.id, cn.summary from case_notes cn
join case_note_details cnd
on cnd.case_note_id = cn.id
and cnd.product_id not in (1111,2222)
where customer_id = 2
In both cases the execution plan shows two clustered index scans. Any suggestions for other methods or tweaks to improve performance?
Schema:
CREATE TABLE case_notes
(
id int primary key,
employee_id int,
customer_id int,
order_id int,
summary varchar(50)
);
CREATE TABLE case_note_details
(
id int primary key,
case_note_id int,
product_id int,
detail varchar(1024)
);
Sample data:
INSERT INTO case_notes
(id, employee_id, customer_id, order_id, summary)
VALUES
(1, 1, 2, 1000, 'complaint1'),
(2, 1, 2, 1001, 'complaint2'),
(3, 1, 2, 1002, 'complaint3'),
(4, 1, 2, 1003, 'complaint4');
INSERT INTO case_note_details
(id, case_note_id, product_id, detail)
VALUES
(1, 1, 1111, 'Note 1, order 1000, complaint about product 1111'),
(2, 1, 2222, 'Note 1, order 1000, complaint about product 2222'),
(3, 2, 1111, 'Note 2, order 1001, complaint about product 1111'),
(4, 2, 2222, 'Note 2, order 1001, complaint about product 2222'),
(5, 3, 3333, 'Note 3, order 1002, complaint about product 3333'),
(6, 3, 4444, 'Note 3, order 1002, complaint about product 4444'),
(7, 4, 5555, 'Note 4, order 1003, complaint about product 5555'),
(8, 4, 6666, 'Note 4, order 1003, complaint about product 6666');
Upvotes: 1
Views: 74
Reputation: 1105
You have a clustered index scan because you are not accessing your case_note_details table by its id but via non-indexed columns.
I suggest adding an index to the case-note_details table on case_note_id, product_id.
If you are always accessing the case_note_details via the case_note_id, you might also restructure your primary key to be case_note_id, detail_id. There is no need for an independent id as primary key for dependent records. This would let you re-use your detail primary key index for joins with the header table.
Edit: add an index on customer_id as well to the case_notes table, as Manuel Rocha suggested.
Upvotes: 2
Reputation: 62
In table case_notes create index for customer_id and on table case_note_details create index for case_note_id and case_note_id.
Then try execute both query. Should have better performance now.
Try also this query
select
cn.id,
cn.summary
from
case_notes cn
where
cn.customer_id = 2 and
cn.id in
(
select
distinct cnd.case_note_id
from
case_note_details cnd
where
cnd.product_id not in (1111,2222)
)
Upvotes: 1
Reputation: 1471
When using "exists" I always limit results with "TOP" as bellow:
select cn.id
,cn.summary
from case_notes as cn
where customer_id = 2
and exists (
select TOP 1 1
from case_note_details as cnd
where cnd.case_note_id = cn.id
and cnd.product_id not in (1111,2222)
)
Upvotes: 1
Reputation: 9776
Did you try "in" instead of "exists". This sometimes performs differently:
select cn.id, cn.summary from case_notes cn
where customer_id = 2
and cn.id in (
select cnd.case_note_id from case_note_details cnd
where cnd.product_id not in (1111,2222)
)
Of course, check indexes.
Upvotes: 0