Reputation: 75
I'm using MySQL and am currently stuck trying to understand how a piece of code works. This is regarding the Hackerrank SQL question titled "Challenges". The problem statement is as follows
Julia asked her students to create some coding challenges. Write a query to print the hacker_id, name, and the total number of challenges created by each student. Sort your results by the total number of challenges in descending order. If more than one student created the same number of challenges, then sort the result by hacker_id. If more than one student created the same number of challenges and the count is less than the maximum number of challenges created, then exclude those students from the result.
I have found a working MySQL solution courtesy of this page That uses the following code:
SELECT c.hacker_id, h.name, COUNT(c.challenge_id) AS cnt
FROM Hackers AS h JOIN Challenges AS c ON h.hacker_id = c.hacker_id
GROUP BY c.hacker_id, h.name HAVING
cnt = (SELECT COUNT(c1.challenge_id) FROM Challenges AS c1 GROUP BY c1.hacker_id ORDER BY COUNT(*) DESC LIMIT 1) OR
cnt NOT IN (SELECT COUNT(c2.challenge_id) FROM Challenges AS c2 GROUP BY c2.hacker_id HAVING c2.hacker_id <> c.hacker_id)
ORDER BY cnt DESC, c.hacker_id;
As of now I understand the problem statement up until "If more than one student created the same number of challenges and the count is less than the maximum number of challenges created, then exclude those students from the result." I simply have no clue how to structure a query to solve that statement.
In the code provided above, I understand everything it does until this section
cnt NOT IN (SELECT COUNT(c2.challenge_id) FROM Challenges AS c2 GROUP BY c2.hacker_id HAVING c2.hacker_id <> c.hacker_id)
Can anyone please help me understand what this line accomplishes and the logic behind it? Specifically I don't know what c2.hacker_id <> c.hacker_id is supposed to do. I'm guessing the whole line selects the number of challenge_ids done by particular hacker_ids who aren't the same person, but I have no clue how that solves the query.
Upvotes: 1
Views: 908
Reputation: 11
select C.hacker_id, H.Hacker_Name, challengesCreated from Hackers H join (select hacker_id, count(challenge_id)challengesCreated from Challenges group by hacker_id) C on H.hacker_id = C.hacker_id where challengesCreated = (SELECT top 1 COUNT(c1.challenge_id)C1Cha FROM Challenges AS c1 GROUP BY c1.hacker_id ORDER BY C1Cha DESC) OR challengesCreated NOT IN (SELECT COUNT(c2.challenge_id) FROM Challenges AS c2 GROUP BY c2.hacker_id HAVING c2.hacker_id <> c.hacker_id) order by challengesCreated desc, c.hacker_id;
Upvotes: 0
Reputation: 74680
Suppose you get a list of hacker ids and sub counts out of the query when you don't have this clause:
hacker, counter
1, 10
2, 9
3, 9
Two and three shouldn't be in there because they're tied on count, so we can implement it as excluding anyone who counted 9
Consider that conceptually the database will run the query for every row in the results: when processing hacker 2 row the query gets a list of challenge counts where someone whose id isnt 2. This means when considering hacker 2, the dB will pull back a list of the following counts:
10, --it comes from hacker 1
9 --it comes from hacker 3
The database then goes "i'm processing hacker 2, whose count is 9. I may only include hacker 2 in the results if hacker 2's count(9) is not in the following list of values: 10, 9. Oh, 9 is in the list of banned values. I'll exclude hacker 2 from the results
Repeat for hacker 3, this time a 9 count comes from hacker 2 so 3 is also excluded
Upvotes: 1
Reputation: 522386
Analytic functions greatly help with a question like this, so I will offer a solution using MySQL 8+, which, moving forward, will be the likely database which a reader of your question would be using (and HackerRank will at some point also be using MySQL 8+).
WITH cte AS (
SELECT
c.hacker_id,
h.name,
COUNT(c.challenge_id) AS cnt,
ROW_NUMBER() OVER (ORDER BY COUNT(c.challenge_id) DESC) rn,
MIN(c.hacker_id) OVER (PARTITION BY COUNT(c.challenge_id)) hacker_id_min,
MAX(c.hacker_id) OVER (PARTITION BY COUNT(c.challenge_id)) hacker_id_max
FROM Hackers AS h
INNER JOIN Challenges AS c
ON h.hacker_id = c.hacker_id
GROUP BY
c.hacker_id,
h.name
)
SELECT
hacker_id,
name,
cnt
FROM cte
WHERE
rn = 1 OR hacker_id_min = hacker_id_max
ORDER BY
cnt DESC,
c.hacker_id;
This answer words by computing a row number, sorted in descending order by the count. It also computes the min and max hacker_id
values for each partition of challenge counts. Records are retained if they belong to the highest count, regardless of ties for first place. And records are also retained if the given count is only associated with a single user.
Upvotes: 1