Reputation: 5318

Finding duplicates between two tables

I've got two SQL2008 tables, one is a "Import" table containing new data and the other a "Destination" table with the live data. Both tables are similar but not identical (there's more columns in the Destination table updated by a CRM system), but both tables have three "phone number" fields - Tel1, Tel2 and Tel3. I need to remove all records from the Import table where any of the phone numbers already exist in the destination table.

I've tried knocking together a simple query (just a SELECT to test with just now):

select t2.account_id
from ImportData t2,  Destination t1 
where 
(t2.Tel1!='' AND (t2.Tel1 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel2!='' AND (t2.Tel2 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel3!='' AND (t2.Tel3 IN (t1.Tel1,t1.Tel2,t1.Tel3)))

... but I'm aware this is almost certainly Not The Way To Do Things, especially as it's very slow. Can anyone point me in the right direction?

Upvotes: 1

Answers (3)

Johan

Reputation: 1192

I am not sure on the perforamance of this query, but since I made the effort of writing it I will post it anyway...

;with aaa(tel)
as
(
select Tel1
from Destination
union
select Tel2
from Destination
union
select Tel3
from Destination
)
,bbb(tel, id)
as
(
select Tel1, account_id
from ImportData
union
select Tel2, account_id
from ImportData
union
select Tel3, account_id
from ImportData
)

select distinct b.id
from bbb b
where b.tel in
(
select a.tel
from aaa a
intersect
select b2.tel
from bbb b2
)

Upvotes: 1

luckyluke

Reputation: 1563

this query requires a little more that this information. If You want to write it in the efficient way we need to know whether there is more duplicates each load or more new records. I assume that account_id is the primary key and has a clustered index.

I would use the temporary table approach that is create a normalized table #r with an index on phone_no and account_id like

SELECT Phone, Account into #tmp
FROM 
   (SELECT account_id, tel1, tel2, tel3
   FROM destination) p
UNPIVOT
   (Phone FOR Account IN 
      (Tel1, tel2, tel3)
)AS unpvt;

create unclustered index on this table with the first column on the phone number and the second part the account number. You can't escape one full table scan so I assume You can scan the import(probably smaller). then just join with this table and use the not exists qualifier as explained. Then of course drop the table after the processing luke

Upvotes: 3

Jeremy Gray

Reputation: 1418

Exists will short-circuit the query and not do a full traversal of the table like a join. You could refactor the where clause as well, if this still doesn't perform the way you want.

SELECT *
FROM ImportData t2
WHERE NOT EXISTS (
    select 1 
    from Destination t1
    where (t2.Tel1!='' AND (t2.Tel1 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
          or
          (t2.Tel2!='' AND (t2.Tel2 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
          or
          (t2.Tel3!='' AND (t2.Tel3 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
    )

Upvotes: 1

Finding duplicates between two tables

Answers (3)

Related Questions