Dave Chenell
Dave Chenell

Reputation: 601

SQL sub queries with WHERE NOT EXISTS

I want to select all the characters that your character HAS NOT challenged in the past 24 hours.

 SELECT * FROM challenges
 WHERE userCharID = 642 AND chalTime > DATE_SUB(CURDATE(), INTERVAL 1 DAY)

This returns a few rows with challenges that your character has initiated in the past day

SELECT characterID FROM CHARACTERS 
WHERE NOT EXISTS (SELECT * FROM challenges
                   WHERE userCharID = '610'
                     AND chalTime > DATE_SUB(CURDATE(), INTERVAL 1 DAY))

Am I using WHERE NOT EXISTS wrong?

Upvotes: 2

Views: 6079

Answers (3)

spencer7593
spencer7593

Reputation: 108510

Your NOT EXIST query is really close. All you are missing is the correlation between the subquery and the outer query on characterID.

I've just added the alias c to the table on your outer query, an alias d to the table in your subquery, and added one predicate to the WHERE clause in your subquery

SELECT characterID FROM CHARACTERS c 
WHERE NOT EXISTS (SELECT * FROM challenges d
                   WHERE d.userCharID = '610'
                     AND d.chalTime > DATE_SUB(CURDATE(), INTERVAL 1 DAY)
                     AND d.characterID = c.characterID)    

The "trick" here is the correlation matching d.characterID (from the table in the subquery) to c.characterID (from the table in the outer query.)

So, the query is checking for each character in that outer table, whether our user has had a challenge with that user in the last 24 hours. So, this query will return the result set you specified.

BUT... if you have a relatively large set of characters, and a relatively small set who have been challenged, this not likely going to be the fastest query that returns the result set.


Another approach to getting the result set is to use a LEFT JOIN with an IS NULL predicate (which we refer to as an "anti-join".) If this query:

SELECT d.characterID
  FROM challenges d
 WHERE d.userCharID = 642
   AND d.chalTime > DATE_SUB(CURDATE(), INTERVAL 1 DAY)
 GROUP BY d.characterID

returns a list of all characterID that have been challenged, which is the set of characters that you want to EXCLUDE from the set of ALL characters, then you can use that query as an inline view, like this:

SELECT n.characterID
  FROM characters n
  LEFT
  JOIN (
         SELECT d.characterID
           FROM challenges d
          WHERE d.userCharID = 642
            AND d.chalTime > DATE_SUB(CURDATE(), INTERVAL 1 DAY)
          GROUP BY d.characterID
       ) c
    ON c.characterID = n.characterID
 WHERE c.characterID IS NULL

Here we are getting the list of all characters (n), and matching them to the list of characters that have been challenged (subquery aliases as c). We use a LEFT JOIN operation because we want ALL rows from the characters table, whether a match is found or not.

The WHERE clause then throws out all the rows where we did find a match, so what we are left with is the set of characters who have not been challenged.


In my testing with large sets, this will usually outperforms a NOT EXISTS and a NOT IN (when appropriate indexes are available). But sometimes I find a NOT IN is faster, sometimes the NOT EXISTS is faster.

I find it's good to have all three approaches "in my pocket", and use whichever is most appropriate. I usually start with the anti-join pattern (it's what I'm used to writing), and then test both the NOT EXISTS and the NOT IN to compare performance.

Upvotes: 0

Jonathan Leffler
Jonathan Leffler

Reputation: 755064

Am I using WHERE NOT EXISTS wrong?

Yes. You want to be using NOT IN rather than NOT EXISTS. If you use NOT EXISTS and the non-existential sub-query returns any rows, then the condition will be false and no data will be returned by the main query. If no rows are returned, then the condition will be true and all rows will be returned by the main query (since, in this example, there are no other criteria in the main query). Often, the sub-query in a NOT EXISTS is a correlated sub-query, so the sub-query has to be evaluated for each row. Here, you don't have a correlated sub-query (which is good for performance). But your query means 'return information about all characters unless there exists some character who's been challenged in the last one day by the nominated user'.

(In this analysis, I've quietly changed the SQL so that userCharID is always compared with a string, and with the value '642' specifically.)

Select all the characters that your character [has] challenged in the past 24 hours:

SELECT *
  FROM Challenges
 WHERE userCharID = '642'
   AND chalTime > DATE_SUB(CURDATE(), INTERVAL 1 DAY)

This returns a few rows with challenges that your character has initiated in the past day.

So, to find all the people that you have not challenged, you need to select all the users except those in the list you have challenged, which translates to:

SELECT characterID
  FROM Characters 
 WHERE userCharID NOT IN
       (SELECT userCharID
          FROM Challenges
         WHERE userCharID = '642'
           AND chalTime > DATE_SUB(CURDATE(), INTERVAL 1 DAY)
       )

This should give you the (possibly rather large) list of characters who you've not challenged within the last 24 hours.

Upvotes: 3

Harald Brinkhof
Harald Brinkhof

Reputation: 4455

WHERE NOT EXISTS in the context of a subquery returns TRUE or FALSE depending on the result.

If a subquery returns any rows at all, EXISTS subquery is TRUE, and NOT EXISTS subquery is FALSE.

in your case it means that if

(SELECT * FROM challenges
WHERE userCharID = '610' AND chalTime > DATE_SUB(CURDATE(), INTERVAL 1 DAY))

returns any rows at all then

your query wil be evaluated as

SELECT characterID FROM CHARACTERS WHERE FALSE; 

which obviously is not what you want.

You can use the IN operator instead:

SELECT characterID FROM CHARACTERS 
WHERE characterID NOT IN (SELECT characterID FROM challenges
WHERE userCharID = '610' AND chalTime > DATE_SUB(CURDATE(), INTERVAL 1 DAY))

Where the second characterID (the one in the subquery) needs to be the field that corresponds to the characterID in your CHARACTERS table, this might be userCharID for you, though I doubt it, given your where clause. Without the schema I just can't tell for sure.

Other options at your disposal would be selecting directly from the subquery or in some cases getting your data through joins.

Upvotes: 2

Related Questions