BumbleBee
BumbleBee

Reputation: 10779

Get the top n rows from each group

I have a table as following:

[ID] [Name] [Score] [Class] 
1     John    90        A   
2     Mary    63        A   
3     Tom     87        A   
4     David   98        A   
5     Mary    87        B   
6     David   77        B   
7     David   73        C   
8     Mary    92        C   
9     Tom     73        C   
10    John    79        C   
11    Mary    70        D   
12    Jane    85        D   
13    David   83        D   

I need to get the top 2 persons based on the scores in each class.

My expected output is

[ID] [Name] [Score] [Class] 
1     John    90        A   
4     David   98        A   
5     Mary    87        B   
6     David   77        B   
8     Mary    92        C   
10    John    79        C   
12    Jane    85        D   
13    David   83        D 

Here is what I tried so far but this is not producing the correct results

SELECT *
FROM   Student s
WHERE 
        (
            SELECT  COUNT(*) 
            FROM    Student  f
            WHERE f.name = s.name AND 
                  f.score >= s.score
        ) <= 2

Upvotes: 0

Views: 54

Answers (2)

spencer7593
spencer7593

Reputation: 108370

The specification said... "for each class"

The smallest change that would need to be made to the proposed query to get the the specified results "students with the two highest scores for each class", we'd need to replace one of the predicates in the correlated subquery... matching on [class] rather than on [name]...

  SELECT s.*
    FROM Student s
   WHERE 
         (
            SELECT COUNT(*) 
              FROM Student  f
             WHERE f.class = s.class
               AND f.score >= s.score
         ) <= 2

Note that if there are multiple students in the class with the same "highest scores", the query will return all of those students, not just two students for class.

If we specifically want to return "at most two students from each class, the students who have the highest score in each class", we'd need to write the query a little differently.

Given the example data, with no duplicate scores in a class, the results would be the same.

The difference can be demonstrated by adding a row to the example data, for example, adding Saul, having the same "second highest" score as David.

 11    Mary    70        D   
 12    Jane    85        D   
 13    David   83        D 
 14    Saul    83        D 

The question we need to ask about the specification... should only two students be returned for this class, Jane has the highest score, so obviously return Jane. But David and Saul have the same score. Do we return both, or if we only return one of them, does it matter which one we return?

Should we return three rows:

 [ID] [Name] [Score] [Class] 
 12    Jane    85        D   
 13    David   83        D 
 14    Saul    83        D 

because those are all of the students with the two highest scores, or should we return just two of the students with the highest scores:

 [ID] [Name] [Score] [Class] 
 12    Jane    85        D   
 13    David   83        D 

or

 [ID] [Name] [Score] [Class] 
 12    Jane    85        D   
 14    Saul    83        D 

Once that question is answered, we can write a query that returns the specified result.

And (obviously) this isn't the only query. There are other query patterns that will return an equivalent result... using either ANSI-standard syntax, or vendor specific extensions.

Upvotes: 1

Felix Pamittan
Felix Pamittan

Reputation: 31879

Use ROW_NUMBER:

SELECT
    ID, Name, Score, Class
FROM(
    SELECT *,
        rn = ROW_NUMBER() OVER(PARTITION BY Class ORDER BY Score DESC)
    FROM Student
) t
WHERE rn <= 2

DEMO

Upvotes: 1

Related Questions