Alan Geleynse
Alan Geleynse

Reputation: 25139

Select where column is null or not in subquery

I have 2 tables with names and a schedule of dates related to those names. I am trying to get a list of all names that have never been scheduled or have not been scheduled in the last 2 months.

   SELECT n.name, MAX(s.date) 
     FROM names n
LEFT JOIN schedule s ON n.id = s.nameid
    WHERE s.nameid IS NULL 
       OR s.nameid NOT IN (SELECT nameid 
                             FROM schedule 
                            WHERE date > NOW() - INTERVAL 2 MONTH)
 GROUP BY n.id

When I run this query MySQL takes over the CPU and doesn't respond.

When I change it to either of these I get results, but only half the results I am looking for:

   SELECT n.name, MAX(s.date) 
     FROM names n
LEFT JOIN schedule s ON n.id = s.nameid
    WHERE s.nameid IS NULL
 GROUP BY n.id

   SELECT n.name, MAX(s.date) 
     FROM names n
LEFT JOIN schedule s ON n.id = s.nameid
    WHERE s.nameid NOT IN (SELECT nameid 
                             FROM schedule 
                            WHERE date > NOW() - INTERVAL 2 MONTH)
 GROUP BY n.id

I am not sure how to get this query working to return all the results or why it takes over the CPU.

Upvotes: 2

Views: 2982

Answers (2)

Morgon
Morgon

Reputation: 3319

Without knowing your full schema, is there something wrong with

   SELECT n.name, max(s.date)
     FROM names n
LEFT JOIN schedule s 
       ON n.id = s.nameid
    WHERE s.nameid IS NULL
       OR s.date <= DATE_SUB(CURDATE(), INTERVAL 2 MONTH)
 GROUP BY n.id

Upvotes: 0

OMG Ponies
OMG Ponies

Reputation: 332681

OR is a notorious poor performer, in every database I can think of. A UNION if often a better alternative:

   SELECT n.name, MAX(s.date) 
     FROM names n
LEFT JOIN schedule s ON n.id = s.nameid
    WHERE s.nameid IS NULL
 GROUP BY n.id
UNION
   SELECT n.name, MAX(s.date) 
     FROM names n
LEFT JOIN schedule s ON n.id = s.nameid
    WHERE s.nameid NOT IN (SELECT nameid 
                             FROM schedule 
                            WHERE date > NOW() - INTERVAL 2 MONTH)
 GROUP BY n.id

UNION will remove duplicates -- if there's no concern about duplicates, use UNION ALL (it's faster, because it doesn't remove duplicates).

Additionally, the latter query has criteria against the OUTER JOIN'd table in the WHERE clause -- this criteria is applied after the JOIN. You might get different results using:

   SELECT n.name, MAX(s.date) 
     FROM names n
LEFT JOIN schedule s ON s.nameid = n.id
                    AND s.nameid NOT IN (SELECT nameid 
                                           FROM schedule 
                                          WHERE date > NOW() - INTERVAL 2 MONTH)
 GROUP BY n.id

In the example above, the criteria will be applied before the OUTER JOIN. For INNER JOINs, this can be ignored -- criteria in either spot is equivalent.

Upvotes: 1

Related Questions