Chris Olsen
Chris Olsen

Reputation: 3471

Suggestions for improving slow performance of subquery

I've tried to illustrate the problem in the (made-up) example below. Essentially, I want to filter records in the primary table based on content in a secondary table. When I attempted this using subqueries, our application performance took a big hit (some queries nearly 10x slower).

In this example I want to return all case notes for a customer EXCEPT for the ones that have references to products 1111 and 2222 in the detail table:

select cn.id, cn.summary from case_notes cn
where customer_id = 2
and exists (
  select 1 from case_note_details cnd
  where cnd.case_note_id = cn.id
  and cnd.product_id not in (1111,2222)
)

I tried using a join as well:

select distinct cn.id, cn.summary from case_notes cn
join case_note_details cnd 
  on cnd.case_note_id = cn.id 
    and cnd.product_id not in (1111,2222)
where customer_id = 2

In both cases the execution plan shows two clustered index scans. Any suggestions for other methods or tweaks to improve performance?

Schema:

CREATE TABLE case_notes 
    (
     id int primary key, 
     employee_id int,
     customer_id int,
     order_id int,
     summary varchar(50)
    );

CREATE TABLE case_note_details 
    (
     id int primary key, 
     case_note_id int,
     product_id int,
     detail varchar(1024)
    );

Sample data:

INSERT INTO case_notes
(id, employee_id, customer_id, order_id, summary)
VALUES
(1, 1, 2, 1000, 'complaint1'),
(2, 1, 2, 1001, 'complaint2'),
(3, 1, 2, 1002, 'complaint3'),
(4, 1, 2, 1003, 'complaint4');

INSERT INTO case_note_details
(id, case_note_id, product_id, detail)
VALUES
(1, 1, 1111, 'Note 1, order 1000, complaint about product 1111'),
(2, 1, 2222, 'Note 1, order 1000, complaint about product 2222'),
(3, 2, 1111, 'Note 2, order 1001, complaint about product 1111'),
(4, 2, 2222, 'Note 2, order 1001, complaint about product 2222'),
(5, 3, 3333, 'Note 3, order 1002, complaint about product 3333'),
(6, 3, 4444, 'Note 3, order 1002, complaint about product 4444'),
(7, 4, 5555, 'Note 4, order 1003, complaint about product 5555'),
(8, 4, 6666, 'Note 4, order 1003, complaint about product 6666');

Upvotes: 1

Views: 74

Answers (4)

gabnaim
gabnaim

Reputation: 1105

You have a clustered index scan because you are not accessing your case_note_details table by its id but via non-indexed columns.

I suggest adding an index to the case-note_details table on case_note_id, product_id.

If you are always accessing the case_note_details via the case_note_id, you might also restructure your primary key to be case_note_id, detail_id. There is no need for an independent id as primary key for dependent records. This would let you re-use your detail primary key index for joins with the header table.

Edit: add an index on customer_id as well to the case_notes table, as Manuel Rocha suggested.

Upvotes: 2

Manuel Rocha
Manuel Rocha

Reputation: 62

In table case_notes create index for customer_id and on table case_note_details create index for case_note_id and case_note_id.

Then try execute both query. Should have better performance now.

Try also this query

select 
   cn.id, 
   cn.summary 
from 
   case_notes cn 
where 
   cn.customer_id = 2 and 
   cn.id in 
   (
       select 
          distinct cnd.case_note_id 
       from 
          case_note_details cnd 
       where 
          cnd.product_id not in (1111,2222)
   )

Upvotes: 1

Rubik
Rubik

Reputation: 1471

When using "exists" I always limit results with "TOP" as bellow:

select  cn.id
        ,cn.summary
from    case_notes as cn
where   customer_id = 2
and     exists (
            select  TOP 1 1
            from    case_note_details as cnd
            where   cnd.case_note_id = cn.id
            and     cnd.product_id not in (1111,2222)
)

Upvotes: 1

Necreaux
Necreaux

Reputation: 9776

Did you try "in" instead of "exists". This sometimes performs differently:

select cn.id, cn.summary from case_notes cn
where customer_id = 2
and cn.id in (
  select cnd.case_note_id from case_note_details cnd
  where cnd.product_id not in (1111,2222)
)

Of course, check indexes.

Upvotes: 0

Related Questions