Reputation: 13200
I've never had good performance with IN in MySQL and I've hit a performance issue with it again.
I'm trying to create a view. The relevant part of it is:
SELECT
c.customer_id,
....
IF (c.customer_id IN (
SELECT cn.customer_id FROM customer_notes cn
), 1, 0) AS has_notes
FROM customers c;
Basically, I just want to know if the customer has a note attached to it or not. It doesn't matter how many notes. How can I rewrite this using JOIN to speed it up?
The customers table currently has 1.5 million rows so performance is an issue.
Upvotes: 0
Views: 586
Reputation: 753775
Don't you need the customer ID selected? As it stands, aren't you running the subquery once per customer, and getting a stream of true or false values with no idea which one applies to which customer?
If that is what you need, you don't need to reference the customers table (unless you keep your database in a state of semantic disintegrity and there could be entries in customer_notes for which there is no corresponding customer - but then you have bigger problems than the performance of this query); you can simply use:
SELECT DISTINCT Customer_ID
FROM Customer_Notes
ORDER BY Customer_ID;
to obtain the list of customer ID values with at least one entry in the Customer_Notes table.
If you want a list of Customer ID values and an associated true/false value, then you need to do a join:
SELECT C.Customer_ID,
CASE WHEN N.Have_Notes IS NULL THEN 0 ELSE 1 END AS Has_Notes
FROM Customers AS C
LEFT JOIN (SELECT Customer_ID, COUNT(*) AS Have_Notes
FROM Customer_Notes
GROUP BY Customer_ID) AS N
ON C.Customer_ID = N.Customer_ID
ORDER BY C.Customer_ID;
If this gives poor performance, check that you have an index on Customer_Notes.Customer_ID. If that isn't the issue, study the query plan.
Can't do ... in a view
The petty restrictions on what is allowed in a view is always a nuisance in any DBMS (MySQL is not alone in having restrictions). However, we can do it with a single regular join. I just remembered. COUNT(column)
only counts non-null values, returning 0 if all values are null, so - if you don't mind getting a count rather than just 0 or 1 - you can use:
SELECT C.Customer_ID,
COUNT(N.Customer_ID) AS Num_Notes
FROM Customers AS C
LEFT JOIN Customer_Notes AS N
ON C.Customer_ID = N.Customer_ID
GROUP BY C.Customer_ID
ORDER BY C.Customer_ID;
And if you absolutely must have 0 or 1:
SELECT C.Customer_ID,
CASE WHEN COUNT(N.Customer_ID) = 0 THEN 0 ELSE 1 END AS Has_Notes
FROM Customers AS C
LEFT JOIN Customer_Notes AS N
ON C.Customer_ID = N.Customer_ID
GROUP BY C.Customer_ID
ORDER BY C.Customer_ID;
Note that the use of 'N.Customer_ID' is crucial - though any column in the table would do (but you've not divulged the names of any other columns, AFAICR) and I'd normally use something other than the joining column for clarity.
Upvotes: 1
Reputation: 82903
Try this
SELECT
CASE WHEN cn.customer_id IS NOT NULL THEN 1
ELSE 0
END AS filter_notes
FROM customers c LEFT JOIN customer_notes cn
ON c.customer_id= cn.customer_id
Upvotes: 0
Reputation: 86718
I think EXISTS
suits your situation better than JOIN
or IN
.
SELECT
IF (EXISTS (
SELECT *
FROM customer_notes cn
WHERE c.customer_id = cn.customer_id),
1, 0) AS filter_notes
FROM customers
Upvotes: 1