David Redden
David Redden

Reputation: 147

Transact SQL using EXCEPT vs INTERSECT

I have two SQL Tables that have a key based off First_Name, Last_Name & Date_Of_Birth. I am trying to write a simple transact sql to find exceptions where Table1 Patient_Key is not found in Table2

Table 1 contains 152758 records and will be a new dataset every month Table 2 contains 8388 records and will continue to grow

So my query as it stands takes over 1/2 hour to return zero reults (which I knew it would have no results due to manually querying each table separately for distinct Patient_Keys Here is the query as it stands:

 SELECT T1.*
 FROM TABLE1 T1
 WHERE  upper(T1.FIRST_NAME)  + UPPER(t1.LAST_NAME) + 
 REPLACE(CONVERT(VARCHAR (10), T1.DATE_OF_BIRTH, 120), '-','') NOT IN 
 (SELECT DISTINCT upper(T2.FIRST_NAME)  + UPPER(T2.LAST_NAME) +
 REPLACE(CONVERT(VARCHAR (10), T2.DATE_OF_BIRTH, 120), '-','')
 FROM TABLE2 T2)

Is there a more efficient SQL cost-savings method?

Upvotes: 0

Views: 119

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269603

I would recommend left join for this:

 SELECT T1.*
 FROM TABLE1 T1 LEFT JOIN
      TABLE2 T2
      ON t1.first_name = t2.first_name AND
         t1.last_name = t2.last_name AND
         t1.date_of_birth = t2.date_of_birth
 WHERE t2.first_name IS NULL;

The problem with your query are the concatenations. If you are in an environment that has case sensitive collations, then you should add computed columns in both tables that are single case.

For this query, create an index on table2(first_name, last_name, date_of_birth). That should give you the performance you need.

Upvotes: 4

Bacon Bits
Bacon Bits

Reputation: 32145

Checking for rows to exist across multiple keys works much better with a WHERE NOT EXISTS correlated subquery:

SELECT *
FROM Table1 T1
WHERE NOT EXISTS (
        SELECT 1 
        FROM Table2 T2
        WHERE T2.FIRST_NAME = T1.FIRST_NAME
            AND T2.LAST_NAME = T1.LAST_NAME
            AND T2.DATE_OF_BIRTH = T1.DATE_OF_BIRTH
    )

If your database is actually configured to use case-sensitive collation, you should use the COLLATE option to enforce case-insensitive comparisons. It's significantly more efficient. There should be an equivalent case-insensitive collation whatever your configuration.

SELECT *
FROM Table1 T1
WHERE NOT EXISTS (
        SELECT 1 
        FROM Table2 T2
        WHERE T2.FIRST_NAME = T1.FIRST_NAME COLLATE SQL_Latin1_General_CP1_CI_AS
            AND T2.LAST_NAME = T1.LAST_NAME COLLATE SQL_Latin1_General_CP1_CI_AS
            AND T2.DATE_OF_BIRTH = T1.DATE_OF_BIRTH
    )

If you have an index on Table1 (FIRST_NAME, LAST_NAME, DATE_OF_BIRTH) and Table2 (FIRST_NAME, LAST_NAME, DATE_OF_BIRTH), you should have even better performance.

Upvotes: 2

Erik Blessman
Erik Blessman

Reputation: 693

Avoid the conversions and concatenations

;WITH NotInT2 AS (
    SELECT first_name, last_name, date_of_birth
    FROM   t1
    EXCEPT
    SELECT first_name, last_name, date_of_birth
    FROM   t2
)
SELECT *
FROM   t1
JOIN   NotInT2
    ON NotInT2.first_name = t1.first_name
   AND NotInT2.last_name = t1.last_name
   AND NotInT2.date_of_birth = t1.date_of_birth

Also, only use the UPPER() function if it is necessary

Upvotes: 0

Related Questions