Abhishek Oza
Abhishek Oza

Reputation: 3480

MySQL(version 5.5): Why `JOIN` is faster than `IN` clause?

[Summary of the question: 2 SQL statements produce same results, but at different speeds. One statement uses JOIN, other uses IN. JOIN is faster than IN]

I tried a 2 kinds of SELECT statement on 2 tables, named booking_record and inclusions. The table inclusions has a many-to-one relation with table booking_record.

(Table definitions not included for simplicity.)

First statement: (using IN clause)

SELECT
  id,
  agent,
  source
FROM
  booking_record
WHERE
  id IN
  ( SELECT DISTINCT
      foreign_key_booking_record
    FROM
      inclusions
    WHERE
      foreign_key_bill IS NULL
      AND
      invoice_closure <> FALSE
  )

Second statement: (using JOIN)

SELECT
  id,
  agent,
  source
FROM
  booking_record
  JOIN
  ( SELECT DISTINCT
      foreign_key_booking_record
    FROM
      inclusions
    WHERE
      foreign_key_bill IS NULL
      AND
      invoice_closure <> FALSE
  ) inclusions
  ON
  id = foreign_key_booking_record

with 300,000+ rows in booking_record-table and 6,100,000+ rows in inclusions-table; the 2nd statement delivered 127 rows in just 0.08 seconds, but the 1st statement took nearly 21 minutes for same records.

Why JOIN is so much faster than IN clause?

Upvotes: 0

Views: 121

Answers (3)

Caff&#233;
Caff&#233;

Reputation: 1171

You should explore this by using EXPLAIN, as said by Ollie.

But in advance, note that the second command has one more filter: id = foreign_key_booking_record.

Check if this has the same performance:

SELECT
  id,
  agent,
  source
FROM
  booking_record
WHERE
  id IN
  ( SELECT DISTINCT
      foreign_key_booking_record
    FROM
      inclusions
    WHERE
      id = foreign_key_booking_record -- new filter
      AND
      foreign_key_bill IS NULL
      AND
      invoice_closure <> FALSE
  )

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269753

This behavior is well-documented. See here.

The short answer is that until MySQL version 5.6.6, MySQL did a poor job of optimizing these types of queries. What would happen is that the subquery would be run each time for every row in the outer query. Lots and lots of overhead, running the same query over and over. You could improve this by using good indexing and removing the distinct from the in subquery.

This is one of the reasons that I prefer exists instead of in, if you care about performance.

Upvotes: 2

Giles
Giles

Reputation: 1667

EXPLAIN should give you some clues (Mysql Explain Syntax

I suspect that the IN version is constructing a list which is then scanned by each item (IN is generally considered a very inefficient construct, I only use it if I have a short list of items to manually enter).

The JOIN is more likely constructing a temp table for the results, making it more like normal JOINs between tables.

Upvotes: 1

Related Questions