Reputation: 1381
I have a table called users with roughly 250,000 records in it. I have another table called staging with around 75,000 records in it. Staging only has one column, msisdn. I want to check to see how many rows in staging are not present in users.
I have the following query, which I have tested on a small data subset, and it seems to work fine:
SELECT
s.*
FROM staging s
LEFT OUTER JOIN users u ON u.msisdn=s.msisdn
WHERE u.msisdn IS NULL
The problem however, is when I try to run this query on the full list of 250k users. It ran for an hour before I stopped it. Is there any way I can optimise this query?
I have started running the query on subsets of the data in staging, but this is horribly manual:
SELECT
s.*
FROM staging s
LEFT OUTER JOIN users u ON u.msisdn=s.msisdn
WHERE u.msisdn IS NULL
LIMIT 0,10000
msisdn is the primary key of the staging table, but it's not the primary key of the table users. I don't know if that is significant however.
Upvotes: 2
Views: 10069
Reputation: 40675
Things you can do to speed up this query:
Upvotes: 1
Reputation: 37803
First, you can see what indices MySQL is using with the EXPLAIN command. Just write EXPLAIN
in front of your query, and the results will show what index (if any) it's using. Presumably if it's that slow on so (relatively) small a data set as 250,000 records, it's not exploiting a very effective index, and you'll be able to see where.
It may also help to rewrite the query as a NOT EXISTS
like so:
SELECT s.* FROM staging s
WHERE NOT EXISTS (SELECT 1 FROM users WHERE users.misdn = s.misdn)
Upvotes: 4
Reputation: 17041
I'm not sure how much quicker this will be, but you can try something like.
select msisdn
from staging
where msisdn not in (select msisdn from users)
Also, make sure that an index exists for the msisdn column in both tables. That should speed things up tremendously.
Upvotes: 0
Reputation: 95113
Put indexes on the msisdn
columns of each table. Since it's not a PK on users
, you'll need to put a non-clustered index on it. That should speed up your query tremendously.
Upvotes: 1