Reputation: 93
I cannot for the life of me figure out how to get a weighted ranking for scores across X categories. For example, the student needs to answer 10 questions across 3 categories (both # of questions and # of categories will be variable eventually). To get a total score the top 1 score in each of the X (3) categories will be added to whatever is left to add up to 10 total question scores.
Here is the data. I used a CASE WHEN Row_Number() to get the TopInCat
http://sqlfiddle.com/#!6/e6e9f/1
The fiddle has more students.
| Question | Student | Category | Score | TopInCat | |----------|---------|----------|-------|----------| | 120149 | 125 | 6 | 1 | 1 | | 120127 | 125 | 6 | 0.9 | 0 | | 120124 | 125 | 6 | 0.8 | 0 | | 120125 | 125 | 6 | 0.7 | 0 | | 120130 | 125 | 6 | 0.6 | 0 | | 120166 | 125 | 6 | 0.5 | 0 | | 120161 | 125 | 6 | 0.4 | 0 | | 120138 | 125 | 4 | 0.15 | 1 | | 120069 | 125 | 4 | 0.15 | 0 | | 120022 | 125 | 4 | 0.15 | 0 | | 120002 | 125 | 4 | 0.15 | 0 | | 120068 | 125 | 2 | 0.01 | 1 | | 120050 | 125 | 3 | 0.05 | 1 | | 120139 | 125 | 2 | 0 | 0 | | 120156 | 125 | 2 | 0 | 0 |
This is how I envision it needs to look, but it doesn't have to be exactly this. I just need to have 10 questions by 3 categories detail data in a way that would allow me to sum and average the Sort 1-10 column below. The 999's could be null or whatever as long as I can sum whats important and present the details.
| Question | Student | Category | Score | TopInCat | Sort | |----------|---------|----------|-------|----------|------| | 120149 | 125 | 6 | 1 | 1 | 1 | | 120138 | 125 | 4 | 0.15 | 1 | 2 | | 120068 | 125 | 2 | 0.01 | 1 | 3 | | 120127 | 125 | 6 | 0.9 | 0 | 4 | | 120124 | 125 | 6 | 0.8 | 0 | 5 | | 120125 | 125 | 6 | 0.7 | 0 | 6 | | 120130 | 125 | 6 | 0.6 | 0 | 7 | | 120166 | 125 | 6 | 0.5 | 0 | 8 | | 120161 | 125 | 6 | 0.4 | 0 | 9 | | 120069 | 125 | 4 | 0.15 | 0 | 10 | | 120022 | 125 | 4 | 0.15 | 0 | 999 | | 120002 | 125 | 4 | 0.15 | 0 | 999 | | 120050 | 125 | 3 | 0.05 | 1 | 999 | | 120139 | 125 | 2 | 0 | 0 | 999 | | 120156 | 125 | 2 | 0 | 0 | 999 |
One last thing, the category no longer matters once the X (3) threshold is met. So a 4th category would just sort normally.
| Question | Student | Category | Score | TopInCat | Sort | |----------|---------|----------|-------|----------|------| | 120149 | 126 | 6 | 1 | 1 | 1 | | 120138 | 126 | 4 | 0.75 | 1 | 2 | | 120068 | 126 | 2 | 0.50 | 1 | 3 | | 120127 | 126 | 6 | 0.9 | 0 | 4 | | 120124 | 126 | 6 | 0.8 | 0 | 5 | | 120125 | 126 | 6 | 0.7 | 0 | 6 | | 120130 | 126 | 6 | 0.6 | 0 | 7 | | 120166 | 126 | 6 | 0.5 | 0 | 8 | | 120050 | 126 | 3 | 0.45 | 1 | 9 |******** | 120161 | 126 | 6 | 0.4 | 0 | 10 | | 120069 | 126 | 4 | 0.15 | 0 | 999 | | 120022 | 126 | 4 | 0.15 | 0 | 999 | | 120002 | 126 | 4 | 0.15 | 0 | 999 | | 120139 | 126 | 2 | 0 | 0 | 999 | | 120156 | 126 | 2 | 0 | 0 | 999 |
I really appreciate any help. Been banging my head on this for a few days.
Upvotes: 4
Views: 89
Reputation: 63378
With such matters I like to proceed with a 'building blocks' approach. Following the maxim of first make it work, then if you need to make it fast, this first step is often enough.
So, given
CREATE TABLE WeightedScores
([Question] int, [Student] int, [Category] int, [Score] dec(3,2))
;
and your sample data
INSERT INTO WeightedScores
([Question], [Student], [Category], [Score])
VALUES
(120161, 123, 6, 1), (120166, 123, 6, 0.64), (120138, 123, 4, 0.57), (120069, 123, 4, 0.5),
(120068, 123, 2, 0.33), (120022, 123, 4, 0.18), (120061, 123, 6, 0), (120002, 123, 4, 0),
(120124, 123, 6, 0), (120125, 123, 6, 0), (120137, 123, 6, 0), (120154, 123, 6, 0),
(120155, 123, 6, 0), (120156, 123, 6, 0), (120139, 124, 2, 1), (120156, 124, 2, 1),
(120050, 124, 3, 0.88), (120068, 124, 2, 0.87), (120161, 124, 6, 0.87), (120138, 124, 4, 0.85),
(120069, 124, 4, 0.51), (120166, 124, 6, 0.5), (120022, 124, 4, 0.43), (120002, 124, 4, 0),
(120130, 124, 6, 0), (120125, 124, 6, 0), (120124, 124, 6, 0), (120127, 124, 6, 0),
(120149, 124, 6, 0), (120149, 125, 6, 1), (120127, 125, 6, 0.9), (120124, 125, 6, 0.8),
(120125, 125, 6, 0.7), (120130, 125, 6, 0.6), (120166, 125, 6, 0.5), (120161, 125, 6, 0.4),
(120138, 125, 4, 0.15), (120069, 125, 4, 0.15), (120022, 125, 4, 0.15), (120002, 125, 4, 0.15),
(120068, 125, 2, 0.01), (120050, 125, 3, 0.05), (120139, 125, 2, 0), (120156, 125, 2, 0),
(120149, 126, 6, 1), (120138, 126, 4, 0.75), (120068, 126, 2, 0.50), (120127, 126, 6, 0.9),
(120124, 126, 6, 0.8), (120125, 126, 6, 0.7), (120130, 126, 6, 0.6), (120166, 126, 6, 0.5),
(120050, 126, 3, 0.45), (120161, 126, 6, 0.4), (120069, 126, 4, 0.15), (120022, 126, 4, 0.15),
(120002, 126, 4, 0.15), (120139, 126, 2, 0), (120156, 126, 2, 0)
;
let's proceed.
The complicated part here is identifying the top three top-in-category questions; the others of the ten questions of interest per student are simply sorted by score, which is easy. So let's start with identifying the top three top-in-category questions.
First, assign to each row a row number giving the ordering of that score within the category, for the student:
;WITH Numbered1 ( Question, Student, Category, Score, SeqInStudentCategory ) AS
(
SELECT Question, Student, Category, Score
, ROW_NUMBER() OVER (PARTITION BY Student, Category ORDER BY Score DESC) SeqInStudentCategory
FROM WeightedScores
)
Now we are only interested in rows where SeqInStudentCategory
is 1
. Considering only such rows, let's order them by score within student, and number those rows:
-- within the preceding WITH
, Numbered2 ( Question, Student, Category, Score, SeqInStudent ) AS
(
SELECT
Question, Student, Category, Score
, ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Score DESC) SeqInStudent
FROM
Numbered1
WHERE
SeqInStudentCategory = 1
)
Now we are only interested in rows where SeqInStudent
is at most 3
. Let's pull them out, so that we know to include it (and exclude it from the simple sort by score, that we will use to make up the remaining seven rows):
-- within the preceding WITH
, TopInCat ( Question, Student, Category, Score, SeqInStudent ) AS
(
SELECT Question, Student, Category, Score, SeqInStudent FROM Numbered2 WHERE SeqInStudent <= 3
)
Now we have the three top-in-category questions for each student. We now need to identify and order by score the not top-in-category questions for each student:
-- within the preceding WITH
, NotTopInCat ( Question, Student, Category, Score, SeqInStudent ) AS
(
SELECT
Question, Student, Category, Score
, ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Score DESC) SeqInStudent
FROM
WeightedScores WS
WHERE
NOT EXISTS ( SELECT 1 FROM TopInCat T WHERE T.Question = WS.Question AND T.Student = WS.Student )
)
Finally we combine TopInCat
with NotTopInCat
, applying an appropriate offset and restriction to NotTopInCat.SeqInStudent
- we need to add 3
to the raw value, and take the top 7
(which is 10 - 3
):
-- within the preceding WITH
, Combined ( Question, Student, Category, Score, CombinedSeq ) AS
(
SELECT
Question, Student, Category, Score, SeqInStudent AS CombinedSeq
FROM
TopInCat
UNION
SELECT
Question, Student, Category, Score, SeqInStudent + 3 AS CombinedSeq
FROM
NotTopInCat
WHERE
SeqInStudent <= 10 - 3
)
To get our final results:
SELECT * FROM Combined ORDER BY Student, CombinedSeq
;
You can see the results on sqlfiddle.
Note that here I have assumed that every student will always have answers from at least three categories. Also, the final output doesn't have a TopInCat
column, but hopefully you will see how to regain that if you want it.
Also, "(both # of questions and # of categories will be variable eventually)" should be relatively straightforward to deal with here. But watch out for my assumption that (in this case) 3 categories will definitely be present in the answers of each student.
Upvotes: 2