Why does using HAVING eliminate outputs when the same condition works in an IF statement?

Question

Context

I am trying to solve the HackerRank problem at the link below:

https://www.hackerrank.com/challenges/challenges/problem

I am focusing on excluding the unwanted data from the output. The problem states:

If more than one student created the same number of challenges and the count is less than the maximum number of challenges created, then exclude those students from the result.

I have split this up into two conditions:

The student will be included if they had the maximum count of challenges (regardless of how many students achieved this count) OR
The student will be included if their challenge count is unique

What I've tried

I am focusing on the first condition. This is the query I've come up with:

SELECT h.hacker_id, h.name, COUNT(c.challenge_id) as cnt
FROM Hackers as h
LEFT JOIN Challenges as c ON h.hacker_id = c.hacker_id
GROUP BY h.hacker_id, h.name, c.challenge_id
HAVING /* Either the max, or things that are not duplicates*/ 
    COUNT(c.challenge_id) = MAX((SELECT COUNT(c1.challenge_id) FROM Challenges as c1))
ORDER BY cnt DESC, hacker_id ASC

This results in no output to stdout.

The issue is with the HAVING statement. I tried checking the conditional by querying just the MAX(...) statement:

SELECT MAX((SELECT COUNT(c1.challenge_id) FROM Challenges as c1))
FROM Challenges

I get an output.

Then I tried using the whole conditional in an IF() statement:

SELECT IF(
    COUNT(c.challenge_id) = MAX((SELECT COUNT(c1.challenge_id) FROM Challenges as c1)),
    CONCAT(COUNT(c.challenge_id), ', it works',
    'No good'
)
FROM Challenges as c

This returns the same max value as the test of my MAX() clause.

Can anyone explain to me why the full query, with the conditional in the HAVING clause, returns no output?

FanoFN · Accepted Answer

I think you almost got it. A few changes and addition to your current query can return the correct result.

Your original query:

SELECT h.hacker_id, h.name, COUNT(c.challenge_id) as cnt
FROM Hackers as h
LEFT JOIN Challenges as c ON h.hacker_id = c.hacker_id
GROUP BY h.hacker_id, h.name, c.challenge_id /*remove c.challenge_id from the group by */
HAVING  /*remove HAVING from the query*/
    COUNT(c.challenge_id) = MAX((SELECT COUNT(c1.challenge_id) FROM Challenges as c1))
/*take the MAX function and modify the subquery by adding group by, order by and limit then move it to SELECT*/
ORDER BY cnt DESC, hacker_id ASC /*remove this for now*/

After modification of the above, you'll get something like this:

#1st sub-query
SELECT h.hacker_id, h.name, COUNT(*) AS cnt,
       MAX((SELECT COUNT(*) AS cnt FROM Challenges GROUP BY hacker_id ORDER BY cnt DESC LIMIT 1)) AS mxc
FROM Hackers AS h
LEFT JOIN Challenges AS c ON h.hacker_id = c.hacker_id
GROUP BY h.hacker_id, h.name

Make the query above as a sub-query then construct another query to be used as validation on any hacker_id that have the same challenges_created count. This is the sub-query:

#2nd sub-query
SELECT hacker_id, COUNT(*) cnt FROM Challenges GROUP BY hacker_id

Then use LEFT JOIN on the 1st & 2nd query like this

SELECT A.hacker_id, A.name, A.cnt AS 'challenges_created' FROM
(SELECT h.hacker_id, h.name, COUNT(*) AS cnt,
       MAX((SELECT COUNT(c1.challenge_id) AS cnt FROM Challenges AS c1 GROUP BY hacker_id ORDER BY cnt DESC LIMIT 1)) AS mxc
FROM Hackers AS h
LEFT JOIN Challenges AS c ON h.hacker_id = c.hacker_id
GROUP BY h.hacker_id, h.name) A 
LEFT JOIN (SELECT hacker_id, COUNT(*) cnt FROM Challenges GROUP BY hacker_id) V 
ON A.cnt=V.cnt AND A.hacker_id <> V.hacker_id AND V.cnt < A.mxc
WHERE V.cnt IS NULL
ORDER BY A.cnt DESC, A.hacker_id ASC

Lets observe the ON condition on this part of the LEFT JOIN. So ON A.cnt=V.cnt AND A.hacker_id <> V.hacker_id AND V.cnt < A.mxc literally mean "the count is the same but hacker id is not while the second result count is less than the max count from the first result". Add WHERE V.cnt IS NULL to return only the one doesn't match with the ON condition and finalize it with the ORDER BY.

I obtained the data used in the the challenge and created a fiddle from it here : https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=be99385a91d809b3daeaeb1ca3f54e0d

Btw,

"The main difference between WHERE and HAVING clause comes when used together with GROUP BY clause, In that case WHERE is used to filter rows before grouping and HAVING is used to exclude records after grouping.

Read more: https://www.java67.com/2019/06/difference-between-where-and-having-in-sql.html#ixzz6b01iR3NB"

Why does using HAVING eliminate outputs when the same condition works in an IF statement?

Answers (2)

Related Questions