ANisus
ANisus

Reputation: 77945

Make use of index when JOIN'ing against multiple columns

Simplified, I have two tables, contacts and donotcall

CREATE TABLE contacts
(
    id int PRIMARY KEY,
    phone1 varchar(20) NULL,
    phone2 varchar(20) NULL,
    phone3 varchar(20) NULL,
    phone4 varchar(20) NULL
);
CREATE TABLE donotcall
(
    list_id int NOT NULL,
    phone varchar(20) NOT NULL
);
CREATE NONCLUSTERED INDEX IX_donotcall_list_phone ON donotcall
(
    list_id ASC,
    phone ASC
);

I would like to see what contacts matches the phone number in a specific list of DoNotCall phone. For faster lookup, I have indexed donotcall on list_id and phone.

When I make the following JOIN it takes a long time (eg. 9 seconds):

SELECT DISTINCT c.id
FROM contacts c
JOIN donotcall d
    ON d.list_id = 1
    AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)  

Screenshot of execution plan

Execution plan on Pastebin

While if I LEFT JOIN on each phone field seperately it runs a lot faster (eg. 1.5 seconds):

SELECT c.id
FROM contacts c
LEFT JOIN donotcall d1
    ON d1.list_id = 1
    AND d1.phone = c.phone1
LEFT JOIN donotcall d2
    ON d2.list_id = 1
    AND d2.phone = c.phone2
LEFT JOIN donotcall d3
    ON d3.list_id = 1
    AND d3.phone = c.phone3
LEFT JOIN donotcall d4
    ON d4.list_id = 1
    AND d4.phone = c.phone4
WHERE
    d1.phone IS NOT NULL
    OR d2.phone IS NOT NULL
    OR d3.phone IS NOT NULL
    OR d4.phone IS NOT NULL

Screenshot of execution plan

Execution plan on Pastebin

My assumption is that the first snippet runs slowly because it doesn't utilize the index on donotcall.
So, how to do a join towards multiple columns and still have it use the index?

Upvotes: 6

Views: 22671

Answers (2)

HLGEM
HLGEM

Reputation: 96552

With this poor database structure, a UNION ALL query might be fastest.

Upvotes: 0

Andomar
Andomar

Reputation: 238078

SQL Server might think resolving IN (c.phone1, c.phone2, c.phone3, c.phone4) using an index is too expensive.

You can test if the index would be faster with a hint:

SELECT c.*
FROM contacts c
JOIN donotcall d with (index(IX_donotcall_list_phone))
    ON d.list_id = 1
    AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)

From the query plans you posted, it shows the first plan is estimated to produce 40k rows, but it just returns 21 rows. The second plan estimates 1 row (and of course returns 21 too.)

Are your statistics up to date? Out-of-date statistics can explain the query analyzer making bad choices. Statistics should be updated automatically or in a weekly job. Check the age of your statistics with:

select  object_name(ind.object_id) as TableName
,       ind.name as IndexName
,       stats_date(ind.object_id, ind.index_id) as StatisticsDate
from    sys.indexes ind
order by 
        stats_date(ind.object_id, ind.index_id) desc

You can update them manually with:

EXEC sp_updatestats;

Upvotes: 7

Related Questions