pablocity
pablocity

Reputation: 504

How to get MAX value out of the GROUPs COUNT

I've recently started to learn tsql beyond basic inserts and selects, I have test database that I train on, and there is one query that I can't really get to work.

There are 3 tables used in that query, in the picture there are simplified fields and relations

I have 2 following queries - first one is simply displaying students and number of marks from each subject. Second is doing almost what I want to achive - shows students and maxiumum amount of marks they got, so ex. subject1 - (marks) 1, 5, 3, 4 count - 4 subject2 - (marks) 5, 4, 5 - count - 3 Query shows 4 and from what I checked it returns correct results, but I want one more thing - just to show the name of the subject from which there is maximum amount of marks so in the example case - subject1

--Query 1--
SELECT s.Surname, subj.SubjectName, COUNT(m.Mark) as Marks_count
FROM marks m, students s, subjects subj
WHERE m.StudentId = s.StudentNumber and subj.SubjectNumber = m.SubjectId
GROUP BY s.Surname, subj.SubjectName
ORDER BY s.Surname

--Query 2--

SELECT query.Surname, MAX(Marks_count) as Maximum_marks_count FROM (SELECT s.Surname, subj.SubjectNumber, COUNT(m.Mark) as Marks_count
                                    FROM marks m, students s, subjects subj
                                    WHERE marks.StudentId = s.StudentNumber and subj.SubjectNumber = m.SubjectId
                                    GROUP BY s.Surname, subj.SubjectName) as query
GROUP BY query.Surname
ORDER BY query.Surname


--Query 3 - not working as supposed--
SELECT query.Surname, query.SubjectName, MAX(Marks_count) as Maximum_marks_count FROM (SELECT s.Surname, subj.SubjectNumber, COUNT(m.Mark) as Marks_count
                                    FROM marks m, students s, subjects subj
                                    WHERE marks.StudentId = s.StudentNumber and subj.SubjectNumber = m.SubjectId
                                    GROUP BY s.Surname, subj.SubjectName) as query
GROUP BY query.Surname, query.SubjectName
ORDER BY query.Surname

Part of the query 1 result

Part of the query 1 result

Part of the query 2 and unfortunately query 3 result

Part of the query 2 and unfortunately query 3 result

The problem is that when I add to the select statement subject name I got results as from query one - there is no more maximum amount of marks just students, subjects and amount of marks from each subject.

If someone could say what I'm missing, I will much appreciate :)

Upvotes: 0

Views: 131

Answers (2)

Caius Jard
Caius Jard

Reputation: 74605

Here's a query that gets the highest mark per student, put it at the top of your sql file/batch and it will make another "table" you can join to your other tables to get the student name and the subject name:

WITH studentBest as 
SELECT * FROM(
  SELECT *, ROW_NUMBER() OVER(PARTITION BY studentid ORDER BY mark DESC) rown  
  FROM marks) a 
WHERE rown = 1)

You use it like this (for example)

--the WITH bit goes above this line
SELECT *
FROM
  studentBest sb 
  INNER JOIN
  subject s
  ON sb.subjectid = s.subjectnumber

Etc

That's also how you should be doing your joins

How does it work? Well.. it establishes an incrementing counter that restarts every time studentid changes (the partition clause) and the numberin goes in des ending mark order (the order by clause). An outer query selects only those rows with 1 in the row number, ie the top mark per student

Why can't I use group by?

You can, but you have to write a query that summarises the marks table into the top mark (max) per student and then you have to join that data back to the mark table to retrieve the subject and all in it's a lot more faff, often less efficient

What if there are two subjects with the same mark?

Use RANK instead of ROW_NUMBER if you want to see both

Edit in response to your comment:

An extension of the above method:

SELECT * FROM
(
  SELECT *, ROW_NUMBER() OVER(PARTITION BY su, st ORDER BY c DESC) rn FROM
  (
    SELECT studentid st, subjectid su, count(*) c
    FROM marks
    GROUP BY st, su
  ) a
) b
INNER JOIN student stu  on b.st = stu.studentnumber
INNER JOIN subject sub on b.su = sub.subjectnumber
WHERE
  b.rn = 1

We count the marks by student/subject, then rownumber them in descending order of count per student-subject pair, then choose only the first row and join in the other wanted data

Upvotes: 1

pablocity
pablocity

Reputation: 504

Ok thanks to Caius Jard, some other Stack's question and a little bit of experiments I managed to write working query, so this is how I did it.

First I created view from query1 and added one more column to it - studentId. Then I wrote query which almost satisfied me. That question helped me a lot with that task: Question

SELECT  marks.Surname,
    marks.SubjectName,
    marks.Marks_count,
    ROW_NUMBER() OVER(PARTITION BY marks.Surname ORDER BY marks.Surname) as RowNum
FROM MarksAmountPerStudentAndSubject marks
INNER JOIN (SELECT MarksAmountPerStudentAndSubject.Id, 
            MAX(MarksAmountPerStudentAndSubject.Marks_count) as MaxAmount
        FROM MarksAmountPerStudentAndSubject
        GROUP BY MarksAmountPerStudentAndSubject.Id) m
ON m.Id = marks.Id and marks.Marks_count = m.MaxAmount

It gives following results

enter image description here

That's what I wanted to achieve with one exception - if students have the same amount of marks from multiple subjects it displays all of them - thats fine but I decided to restrict this to the first result for each student - I couldn't just simply put TOP(1) there so I used similar solution that Caius Jard showed - ROW_NUMBER and window function - it gave me a chance to choose records that has row number equals to 1. I created another view from this query and I could simply write the final one

SELECT marks.Surname, marks.SubjectName, marks.Marks_count
FROM StudentsMaxMarksAmount marks
WHERE marks.RowNum = 1
ORDER BY marks.Surname

With result

enter image description here

Upvotes: 0

Related Questions