John Mellor
John Mellor

Reputation: 2503

MySQL: Is there a way to speed up a left join looking for rows that do not match a row in another table?

I have a query that looks like this:

SELECT * FROM foo
        LEFT JOIN bar
        ON bar.id = foo.bar_id
        WHERE bar.id IS NULL
        LIMIT 30;

It's attempting to find entries in foo that have a bar_id that does not have a corresponding entry in bar.

There is an index on foo.bar_id and a unique (incremental) index on bar.id

The output of explain looks like this:

id  select_type table   partitions  type    possible_keys   key key_len ref rows    filtered    Extra
1   SIMPLE  foo NULL    ALL NULL    NULL    NULL    NULL    4411867 100.00  NULL
1   SIMPLE  bar NULL    eq_ref  PRIMARY PRIMARY 4   foo.bar_id  1   100.00  Using where; Not exists

(sorry, i've struggled to get the formatting of that table to display very nicely).

This query is sometimes taking 5+ minutes (although for some reason it sometimes takes 30 seconds shortly after running for 5 minutes)

Is there anyway to improve it? Or is it just a hardware limitation?


In response to the comments, this version makes no noticeable improvement in terms of performance.

        SELECT foo.* FROM foo
        LEFT JOIN bar
        ON bar.id = foo.bar_id
        WHERE bar.id IS NULL
        ORDER BY foo.id ASC
        LIMIT 30;

There is a slight difference when selecting foo.* as opposed to * but in my case I was selecting specific columns anyway, I just did not think to include them in the pseudo example.

Upvotes: 0

Views: 49

Answers (1)

Rick James
Rick James

Reputation: 142306

No

The query as written must check foo rows until it finds 30 missing rows from bar. If foo is big, and there aren't many missing rows, that could take a long time.

Maybe

You have SELECT *. See if it runs faster if it returns only SELECT foo.id. There is no need for the bar.* part of *; they will always be NULL. If id is not sufficient, you can use id to get other columns.

Yes

If you are walking through the table picking a few (30) to work on and delete, this is a terribly inefficient way to do it. It is "Order(N^2)". I have a much faster way (O(N)), which involves "remembering where you left off": http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks (That discussion is couched in terms of "deleting in chunks", but it can be adapted to other actions.)

Upvotes: 2

Related Questions