Shah
Shah

Reputation: 75

Need help understanding COUNT and <> operators (Hackerrank SQL Challenges question)

I'm using MySQL and am currently stuck trying to understand how a piece of code works. This is regarding the Hackerrank SQL question titled "Challenges". The problem statement is as follows

Julia asked her students to create some coding challenges. Write a query to print the hacker_id, name, and the total number of challenges created by each student. Sort your results by the total number of challenges in descending order. If more than one student created the same number of challenges, then sort the result by hacker_id. If more than one student created the same number of challenges and the count is less than the maximum number of challenges created, then exclude those students from the result.

I have found a working MySQL solution courtesy of this page That uses the following code:

SELECT c.hacker_id, h.name, COUNT(c.challenge_id) AS cnt 
FROM Hackers AS h JOIN Challenges AS c ON h.hacker_id = c.hacker_id
GROUP BY c.hacker_id, h.name HAVING
cnt = (SELECT COUNT(c1.challenge_id) FROM Challenges AS c1 GROUP BY c1.hacker_id ORDER BY COUNT(*) DESC LIMIT 1) OR
cnt NOT IN (SELECT COUNT(c2.challenge_id) FROM Challenges AS c2 GROUP BY c2.hacker_id HAVING c2.hacker_id <> c.hacker_id)
ORDER BY cnt DESC, c.hacker_id;

As of now I understand the problem statement up until "If more than one student created the same number of challenges and the count is less than the maximum number of challenges created, then exclude those students from the result." I simply have no clue how to structure a query to solve that statement.

In the code provided above, I understand everything it does until this section

 cnt NOT IN (SELECT COUNT(c2.challenge_id) FROM Challenges AS c2 GROUP BY c2.hacker_id HAVING c2.hacker_id <> c.hacker_id)

Can anyone please help me understand what this line accomplishes and the logic behind it? Specifically I don't know what c2.hacker_id <> c.hacker_id is supposed to do. I'm guessing the whole line selects the number of challenge_ids done by particular hacker_ids who aren't the same person, but I have no clue how that solves the query.

Upvotes: 1

Views: 908

Answers (3)

dominic
dominic

Reputation: 11

select C.hacker_id, H.Hacker_Name, challengesCreated from Hackers H join (select hacker_id, count(challenge_id)challengesCreated from Challenges group by hacker_id) C on H.hacker_id = C.hacker_id where challengesCreated = (SELECT top 1 COUNT(c1.challenge_id)C1Cha FROM Challenges AS c1 GROUP BY c1.hacker_id ORDER BY C1Cha DESC) OR challengesCreated NOT IN (SELECT COUNT(c2.challenge_id) FROM Challenges AS c2 GROUP BY c2.hacker_id HAVING c2.hacker_id <> c.hacker_id) order by challengesCreated desc, c.hacker_id;

Upvotes: 0

Caius Jard
Caius Jard

Reputation: 74680

Suppose you get a list of hacker ids and sub counts out of the query when you don't have this clause:

hacker, counter 
1, 10
2, 9
3, 9

Two and three shouldn't be in there because they're tied on count, so we can implement it as excluding anyone who counted 9

Consider that conceptually the database will run the query for every row in the results: when processing hacker 2 row the query gets a list of challenge counts where someone whose id isnt 2. This means when considering hacker 2, the dB will pull back a list of the following counts:

10,  --it comes from hacker 1
9     --it comes from hacker 3

The database then goes "i'm processing hacker 2, whose count is 9. I may only include hacker 2 in the results if hacker 2's count(9) is not in the following list of values: 10, 9. Oh, 9 is in the list of banned values. I'll exclude hacker 2 from the results

Repeat for hacker 3, this time a 9 count comes from hacker 2 so 3 is also excluded

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522386

Analytic functions greatly help with a question like this, so I will offer a solution using MySQL 8+, which, moving forward, will be the likely database which a reader of your question would be using (and HackerRank will at some point also be using MySQL 8+).

WITH cte AS (
    SELECT
        c.hacker_id,
        h.name,
        COUNT(c.challenge_id) AS cnt,
        ROW_NUMBER() OVER (ORDER BY COUNT(c.challenge_id) DESC) rn,
        MIN(c.hacker_id) OVER (PARTITION BY COUNT(c.challenge_id)) hacker_id_min,
        MAX(c.hacker_id) OVER (PARTITION BY COUNT(c.challenge_id)) hacker_id_max
    FROM Hackers AS h
    INNER JOIN Challenges AS c
        ON h.hacker_id = c.hacker_id
    GROUP BY
        c.hacker_id,
        h.name
)

SELECT
    hacker_id,
    name,
    cnt
FROM cte
WHERE
    rn = 1 OR hacker_id_min = hacker_id_max
ORDER BY
    cnt DESC,
    c.hacker_id;

This answer words by computing a row number, sorted in descending order by the count. It also computes the min and max hacker_id values for each partition of challenge counts. Records are retained if they belong to the highest count, regardless of ties for first place. And records are also retained if the given count is only associated with a single user.

Upvotes: 1

Related Questions