Drahcir
Drahcir

Reputation: 11972

Select from one table where not in another

I'm trying to find the rows that are in one table but not another, both tables are in different databases and also have different column names on the column that I'm using to match.

I've got a query, code below, and I think it probably works but it's way too slow:

SELECT `pm`.`id`
FROM `R2R`.`partmaster` `pm`
WHERE NOT EXISTS (
    SELECT * 
    FROM `wpsapi4`.`product_details` `pd`
    WHERE `pm`.`id` = `pd`.`part_num`
)

So the query is trying to do as follows:

Select all the ids from the R2R.partmaster database that are not in the wpsapi4.product_details database. The columns I'm matching are partmaster.id & product_details.part_num

Upvotes: 77

Views: 140564

Answers (5)

Albert Alberto
Albert Alberto

Reputation: 950

The simple workaround that worked for me is as below:

SELECT
    first_table.*
FROM
    first_table
    LEFT JOIN second_table ON second_table.common_column = first_table.common_column 
WHERE
    second_table.common_column IS NULL;

Upvotes: 0

colmaclean
colmaclean

Reputation: 89

To expand on Johan's answer, if the part_num column in the sub-select can contain null values then the query will break.

To correct this, add a null check...

SELECT pm.id FROM r2r.partmaster pm
WHERE pm.id NOT IN 
      (SELECT pd.part_num FROM wpsapi4.product_details pd 
                  where pd.part_num is not null)
  • Sorry but I couldn't add a comment as I don't have the rep!

Upvotes: 8

Johan
Johan

Reputation: 76724

Expanding on Sjoerd's anti-join, you can also use the easy to understand SELECT WHERE X NOT IN (SELECT) pattern.

SELECT pm.id FROM r2r.partmaster pm
WHERE pm.id NOT IN (SELECT pd.part_num FROM wpsapi4.product_details pd)

Note that you only need to use ` backticks on reserved words, names with spaces and such, not with normal column names.

On MySQL 5+ this kind of query runs pretty fast.
On MySQL 3/4 it's slow.

Make sure you have indexes on the fields in question
You need to have an index on pm.id, pd.part_num.

Upvotes: 138

Drahcir
Drahcir

Reputation: 11972

So there's loads of posts on the web that show how to do this, I've found 3 ways, same as pointed out by Johan & Sjoerd. I couldn't get any of these queries to work, well obviously they work fine it's my database that's not working correctly and those queries all ran slow.

So I worked out another way that someone else may find useful:

The basic jist of it is to create a temporary table and fill it with all the information, then remove all the rows that ARE in the other table.

So I did these 3 queries, and it ran quickly (in a couple moments).

CREATE TEMPORARY TABLE

`database1`.`newRows`

SELECT

`t1`.`id` AS `columnID`

FROM

`database2`.`table` AS `t1`

.

CREATE INDEX `columnID` ON `database1`.`newRows`(`columnID`)

.

DELETE FROM `database1`.`newRows`

WHERE

EXISTS(
    SELECT `columnID` FROM `database1`.`product_details` WHERE `columnID`=`database1`.`newRows`.`columnID`
)

Upvotes: 4

Sjoerd
Sjoerd

Reputation: 75659

You can LEFT JOIN the two tables. If there is no corresponding row in the second table, the values will be NULL.

SELECT id FROM partmaster LEFT JOIN product_details ON (...) WHERE product_details.part_num IS NULL

Upvotes: 77

Related Questions