spheroidic
spheroidic

Reputation: 199

Why does using HAVING eliminate outputs when the same condition works in an IF statement?

Context

I am trying to solve the HackerRank problem at the link below:

https://www.hackerrank.com/challenges/challenges/problem

I am focusing on excluding the unwanted data from the output. The problem states:

If more than one student created the same number of challenges and the count is less than the maximum number of challenges created, then exclude those students from the result.

I have split this up into two conditions:

  1. The student will be included if they had the maximum count of challenges (regardless of how many students achieved this count) OR
  2. The student will be included if their challenge count is unique

What I've tried

I am focusing on the first condition. This is the query I've come up with:

SELECT h.hacker_id, h.name, COUNT(c.challenge_id) as cnt
FROM Hackers as h
LEFT JOIN Challenges as c ON h.hacker_id = c.hacker_id
GROUP BY h.hacker_id, h.name, c.challenge_id
HAVING /* Either the max, or things that are not duplicates*/ 
    COUNT(c.challenge_id) = MAX((SELECT COUNT(c1.challenge_id) FROM Challenges as c1))
ORDER BY cnt DESC, hacker_id ASC

This results in no output to stdout.

The issue is with the HAVING statement. I tried checking the conditional by querying just the MAX(...) statement:

SELECT MAX((SELECT COUNT(c1.challenge_id) FROM Challenges as c1))
FROM Challenges 

I get an output.

Then I tried using the whole conditional in an IF() statement:

SELECT IF(
    COUNT(c.challenge_id) = MAX((SELECT COUNT(c1.challenge_id) FROM Challenges as c1)),
    CONCAT(COUNT(c.challenge_id), ', it works',
    'No good'
)
FROM Challenges as c 

This returns the same max value as the test of my MAX() clause.

Can anyone explain to me why the full query, with the conditional in the HAVING clause, returns no output?

Upvotes: 0

Views: 108

Answers (2)

FanoFN
FanoFN

Reputation: 7114

I think you almost got it. A few changes and addition to your current query can return the correct result.

Your original query:

SELECT h.hacker_id, h.name, COUNT(c.challenge_id) as cnt
FROM Hackers as h
LEFT JOIN Challenges as c ON h.hacker_id = c.hacker_id
GROUP BY h.hacker_id, h.name, c.challenge_id /*remove c.challenge_id from the group by */
HAVING  /*remove HAVING from the query*/
    COUNT(c.challenge_id) = MAX((SELECT COUNT(c1.challenge_id) FROM Challenges as c1))
/*take the MAX function and modify the subquery by adding group by, order by and limit then move it to SELECT*/
ORDER BY cnt DESC, hacker_id ASC /*remove this for now*/

After modification of the above, you'll get something like this:

#1st sub-query
SELECT h.hacker_id, h.name, COUNT(*) AS cnt,
       MAX((SELECT COUNT(*) AS cnt FROM Challenges GROUP BY hacker_id ORDER BY cnt DESC LIMIT 1)) AS mxc
FROM Hackers AS h
LEFT JOIN Challenges AS c ON h.hacker_id = c.hacker_id
GROUP BY h.hacker_id, h.name

Make the query above as a sub-query then construct another query to be used as validation on any hacker_id that have the same challenges_created count. This is the sub-query:

#2nd sub-query
SELECT hacker_id, COUNT(*) cnt FROM Challenges GROUP BY hacker_id

Then use LEFT JOIN on the 1st & 2nd query like this

SELECT A.hacker_id, A.name, A.cnt AS 'challenges_created' FROM
(SELECT h.hacker_id, h.name, COUNT(*) AS cnt,
       MAX((SELECT COUNT(c1.challenge_id) AS cnt FROM Challenges AS c1 GROUP BY hacker_id ORDER BY cnt DESC LIMIT 1)) AS mxc
FROM Hackers AS h
LEFT JOIN Challenges AS c ON h.hacker_id = c.hacker_id
GROUP BY h.hacker_id, h.name) A 
LEFT JOIN (SELECT hacker_id, COUNT(*) cnt FROM Challenges GROUP BY hacker_id) V 
ON A.cnt=V.cnt AND A.hacker_id <> V.hacker_id AND V.cnt < A.mxc
WHERE V.cnt IS NULL
ORDER BY A.cnt DESC, A.hacker_id ASC

Lets observe the ON condition on this part of the LEFT JOIN. So ON A.cnt=V.cnt AND A.hacker_id <> V.hacker_id AND V.cnt < A.mxc literally mean "the count is the same but hacker id is not while the second result count is less than the max count from the first result". Add WHERE V.cnt IS NULL to return only the one doesn't match with the ON condition and finalize it with the ORDER BY.

I obtained the data used in the the challenge and created a fiddle from it here : https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=be99385a91d809b3daeaeb1ca3f54e0d

Btw,

"The main difference between WHERE and HAVING clause comes when used together with GROUP BY clause, In that case WHERE is used to filter rows before grouping and HAVING is used to exclude records after grouping.

Read more: https://www.java67.com/2019/06/difference-between-where-and-having-in-sql.html#ixzz6b01iR3NB"

Upvotes: 0

Somy
Somy

Reputation: 1624

The challenge was actually challenging :-). It took more time than I expected. Here is my solution below -

SELECT Tx.hacker_id, 
       Tx.NAME, 
       Tx.challenges 
FROM   (SELECT h.hacker_id                    AS hacker_id, 
               h.NAME                         AS NAME, 
               Count(DISTINCT c.challenge_id) AS challenges 
        FROM   hackers h 
               LEFT JOIN challenges c 
                      ON h.hacker_id = c.hacker_id 
        GROUP  BY h.hacker_id, 
                  h.NAME 
        HAVING Count(DISTINCT c.challenge_id) = (SELECT Max(challenges) 
                                                 FROM 
               (SELECT 
                      h.hacker_id, 
                      Count(DISTINCT c.challenge_id) AS challenges 
                                                         FROM   hackers h 
                                                LEFT JOIN challenges c 
                                                       ON h.hacker_id = 
                                                          c.hacker_id 
                                                         GROUP  BY h.hacker_id) T 
                                                 ) 
        UNION 
        SELECT T1.hacker_id  AS hacker_id, 
               T1.NAME       AS NAME, 
               T1.challenges AS challenges 
        FROM   (SELECT h.hacker_id, 
                       h.NAME, 
                       Count(DISTINCT c.challenge_id) AS challenges 
                FROM   hackers h 
                       LEFT JOIN challenges c 
                              ON h.hacker_id = c.hacker_id 
                GROUP  BY h.hacker_id, 
                          h.NAME) T1 
               JOIN (SELECT T.challenges 
                     FROM   (SELECT h.hacker_id, 
                                    h.NAME, 
                                    Count(DISTINCT c.challenge_id) AS challenges 
                             FROM   hackers h 
                                    LEFT JOIN challenges c 
                                           ON h.hacker_id = c.hacker_id 
                             GROUP  BY h.hacker_id, 
                                       h.NAME) T 
                     GROUP  BY T.challenges 
                     HAVING Count(DISTINCT T.hacker_id) = 1) T2 
                 ON T1.challenges = T2.challenges) Tx 
ORDER  BY 3 DESC, 
          1; 

Although the code looks long but the logic is quite simple. The upper part of the union computes those hacker_id and related info who submitted the maximum amount of chanllenges and the 2nd part of the union tackles to select only those hackers who have submitted non-max but unique number of challenges.

Upvotes: 1

Related Questions