Matt McCormick
Matt McCormick

Reputation: 13200

Rewrite IN subquery as JOIN

I've never had good performance with IN in MySQL and I've hit a performance issue with it again.

I'm trying to create a view. The relevant part of it is:

SELECT
  c.customer_id,
  ....
  IF (c.customer_id IN (
            SELECT cn.customer_id FROM customer_notes cn
        ), 1, 0) AS has_notes
  FROM customers c;

Basically, I just want to know if the customer has a note attached to it or not. It doesn't matter how many notes. How can I rewrite this using JOIN to speed it up?

The customers table currently has 1.5 million rows so performance is an issue.

Upvotes: 0

Views: 586

Answers (3)

Jonathan Leffler
Jonathan Leffler

Reputation: 753775

Don't you need the customer ID selected? As it stands, aren't you running the subquery once per customer, and getting a stream of true or false values with no idea which one applies to which customer?

If that is what you need, you don't need to reference the customers table (unless you keep your database in a state of semantic disintegrity and there could be entries in customer_notes for which there is no corresponding customer - but then you have bigger problems than the performance of this query); you can simply use:

SELECT DISTINCT Customer_ID
  FROM Customer_Notes
 ORDER BY Customer_ID;

to obtain the list of customer ID values with at least one entry in the Customer_Notes table.

If you want a list of Customer ID values and an associated true/false value, then you need to do a join:

SELECT C.Customer_ID,
       CASE WHEN N.Have_Notes IS NULL THEN 0 ELSE 1 END AS Has_Notes
  FROM Customers AS C
  LEFT JOIN (SELECT Customer_ID, COUNT(*) AS Have_Notes 
               FROM Customer_Notes
              GROUP BY Customer_ID) AS N
    ON C.Customer_ID = N.Customer_ID
 ORDER BY C.Customer_ID;

If this gives poor performance, check that you have an index on Customer_Notes.Customer_ID. If that isn't the issue, study the query plan.


Can't do ... in a view

The petty restrictions on what is allowed in a view is always a nuisance in any DBMS (MySQL is not alone in having restrictions). However, we can do it with a single regular join. I just remembered. COUNT(column) only counts non-null values, returning 0 if all values are null, so - if you don't mind getting a count rather than just 0 or 1 - you can use:

SELECT C.Customer_ID,
       COUNT(N.Customer_ID) AS Num_Notes
  FROM Customers AS C
  LEFT JOIN Customer_Notes AS N
    ON C.Customer_ID = N.Customer_ID
 GROUP BY C.Customer_ID
 ORDER BY C.Customer_ID;

And if you absolutely must have 0 or 1:

SELECT C.Customer_ID,
       CASE WHEN COUNT(N.Customer_ID) = 0 THEN 0 ELSE 1 END AS Has_Notes
  FROM Customers AS C
  LEFT JOIN Customer_Notes AS N
    ON C.Customer_ID = N.Customer_ID
 GROUP BY C.Customer_ID
 ORDER BY C.Customer_ID;

Note that the use of 'N.Customer_ID' is crucial - though any column in the table would do (but you've not divulged the names of any other columns, AFAICR) and I'd normally use something other than the joining column for clarity.

Upvotes: 1

Chandu
Chandu

Reputation: 82903

Try this

SELECT
  CASE WHEN cn.customer_id IS NOT NULL THEN 1
        ELSE 0
    END     AS filter_notes
  FROM customers c LEFT JOIN customer_notes cn
    ON c.customer_id= cn.customer_id

Upvotes: 0

Gabe
Gabe

Reputation: 86718

I think EXISTS suits your situation better than JOIN or IN.

SELECT 
   IF (EXISTS ( 
        SELECT *
        FROM customer_notes cn 
        WHERE c.customer_id = cn.customer_id),
       1, 0) AS filter_notes 
FROM customers 

Upvotes: 1

Related Questions