Samuel Parrino
Samuel Parrino

Reputation: 3

Alternative to NOT IN?

I need to check two tables and find inconsistencies, ie where the value of table T1 is not present in the italy_cities table. I'll explain:

T1: Includes personal data (with place of birth) italy_city: Includes all the municipalities of Italy.

Table T1 has about 9000 tuples. T2 has 7,903 tuples.

Using "NOT IN" the query takes approximately 16 seconds to execute.

Here is the query:

SELECT
    `T1`.* 
FROM
    T1
WHERE
    (
    `T1`.place NOT IN ( SELECT municipality FROM italy_cities ) 
    )

MY QUESTION
what is the best and fast option to check for inconsistencies? to check all the "incorrect" municipalities that do not exist in the official database?

Thanks in advance

Upvotes: 0

Views: 295

Answers (2)

Popeye
Popeye

Reputation: 35900

Not exists can perform better but there is also another way which is left join as follows:

SELECT T1.* 
  FROM T1
  LEFT JOIN italy_cities I ON I.municipality = T1.PLACE
 WHERE I.municipality IS NULL;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269773

I generally recommend NOT EXISTS for this purpose:

SELECT T1.* 
FROM T1
WHERE NOT EXISTS (SELECT 1
                  FROM italy_cities ic
                  WHERE t1.place = ic.municipality
                 );

Why? There are two reasons:

  1. NOT IN does not do what you expect if the subquery returns any NULL values. If even one value is NULL all rows end up being filtered out.
  2. This version of the query can take advantage of an index on italy_cities(municipality) which seems like a reasonable index on the table.

Upvotes: 1

Related Questions